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Abstract. In this paper we develop procedures for performing inference in regression 
models about how potential policy interventions affect the entire marginal distribution of 
an outcome of interest. These policy interventions consist of either changes in the dis- 
tribution of covariates related to the outcome holding the conditional distribution of the 
outcome given covariates fixed, or changes in the conditional distribution of the outcome 
given covariates holding the marginal distribution of the covariates fixed. Under either of 
these assumptions, we obtain uniformly consistent estimates and functional central limit 
theorems for the counterfactual and status quo marginal distributions of the outcome 
as well as other function-valued effects of the policy, including, for example, the effects 
of the policy on the marginal distribution function, quantile function, and other related 
functionals. We construct simultaneous confidence sets for these functions; these sets take 
into account the sampling variation in the estimation of the relationship between the out- 
come and covariates. Our procedures rely on, and our theory covers, all main regression 
approaches for modeling and estimating conditional distributions, focusing especially on 
classical, quantile, duration, and distribution regressions. Our procedures are general and 
accommodate both simple unitary changes in the values of a given covariate as well as 
changes in the distribution of the covariates or the conditional distribution of the outcome 
given covariates of general form. We apply the procedures to examine the effects of labor 
market institutions on the U.S. wage distribution. 
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1. Introduction 

A basic objective in empirical economics is to predict the effect of a potential policy 
intervention or a counterfactual change in economic conditions on some outcome variable 
of interest. For example, we might be interested in what the wage distribution would be 
in 2000 if workers have the same characteristics as in 1990, what the distribution of infant 
birth weights would be for black mothers if they receive the same amount of prenatal care 
as white mothers, what the distribution of consumers expenditure would be if we change 
the income tax, or what the distribution of housing prices would be if we clean up a local 
hazardous waste site. In other examples, we might be interested in what the distribution 
of wages for female workers would be in the absence of gender discrimination in the labor 
market (e.g., if female workers arc paid as male workers with the same characteristics), 
or what the distribution of wages for black workers would be in the absence of racial 
discrimination in the labor market (e.g., if black workers are paid as white workers with 
the same characteristics). More generally, we can think of a policy intervention either 
as a change in the distribution of a set of explanatory variables X that determine the 
outcome variable of interest Y, or as a change in the conditional distribution of Y given 
X. Policy analysis consists of estimating the effect on the distribution of F of a change in 
the distribution of X or in the conditional distribution of Y given X. 

In this paper we develop procedures to perform inference in regression models about 
how these counterfactual policy interventions affect the entire marginal distribution of Y. 
The main assumption is that either the policy docs not alter the conditional distribution 
of Y given X and only alters the marginal distribution of X, or that the policy does not 
alter the marginal distribution of X and only alters the conditional distribution of Y given 
X. Starting from estimates of the conditional distribution or quantile functions of the 
outcome given covariatcs, wc obtain uniformly consistent estimates for functionals of the 
marginal distribution function of the outcome before and after the intervention. Examples 
of these functionals include distribution functions, quantile functions, quantile policy ef- 
fects, distribution policy effects, means, variances, and Lorenz curves. We then construct 
confidence sets around these estimates that take into account the sampling variation com- 
ing from the estimation of the conditional model. These confidence sets are uniform in the 
sense that they cover the entire functional of interest with pre-specified probability. Our 
analysis specifically targets and covers the principal approaches to estimating conditional 
distribution models most often used in empirical work, including classical, quantile, du- 
ration, and distribution regressions. Moreover, our approach can be used to analyze the 
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effect of botli simple interventions consisting of unitary clianges in tlie values of a given 
covariate as well as more elaborate policies consisting of general clianges in the covariate 
distribution or in the conditional distribution of the outcome given covariates. Moreover, 
the counterfactual distribution of X and conditional distribution of Y given X can corre- 
spond to known transformations of these distributions or to the distributions in a different 
subpopulation or group. This array of alternatives allows us to answer a wide variety of 
policy questions such as the ones mentioned in the first paragraph. 

To develop the inference results, we establish the functional (Hadamard) differentiabihty 
of the marginal distribution functions before and after the policy with respect to the limit 
of the functional estimators of the conditional model of the outcome given the covariates. 
This result allows us to derive the asymptotic distribution for the functionals of interest 
taking into account the sampling variation coming from the first stage estimation of the 
relationship between the outcome and covariates by means of the functional delta method. 
Moreover, this general approach based on functional differentiability allows us to estabhsh 
the validity of convenient resampling methods, such as bootstrap and other simulation 
methods, to make uniform inference on the functionals of interest. Because our analysis 
relies only on the conditional quantile estimators or conditional distribution estimators 
satisfying a functional central limit theorem, it applies quite broadly and we show it covers 
the major regression methods listed above. As a consequence, we cover a wide array of 
techniques, though in the discussion we devote attention primarily to the most practical 
and commonly used methods of estimating conditional distribution and quantile functions. 

This paper contributes to the previous literature on estimating policy effects using re- 
gression methods. In particular, important developments include the work of Stock (1989), 
which introduced regression-based estimators to evaluate the mean effect of policy inter- 
ventions, and of Gosling, Machin, and Meghir (2000) and Machado and Mata (2005), 
which proposed quantile rcgrcssion-bascd policy estimators to evaluate distributional ef- 
fects of policy interventions, but did not provide distribution or inference theory for these 
estimators. Our paper contributes to this literature by providing regression-based policy 
estimators to evaluate quantile, distributional, and other effects (e.g., Lorenz and Gini 
effects) of a general policy intervention and by deriving functional limit theory as well 
as practical inferential tools for these policy estimators. Our policy estimators are based 
on a rich variety of regression models for the conditional distribution, including classical. 
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quantile, duration, and distribution regressions|3 In particular, our theory covers the pre- 
vious estimators of Goshng, Machin, and Meghir (2000) and Machado and Mata (2005) as 
important special cases. In fact, our limit theory is generic and applies to any estimator of 
the conditional distribution that satisfies a functional central limit theorem. Accordingly, 
we cover not only a wide array of the most practical current approaches for estimating 
conditional distributions, but also many other existing and future approaches, including, 
for example, approaches that accommodate endogeneity (Abadie, Angrist, and Imbens, 
2002, Chesher , 2003, Chernozhukov and Hansen, 2005, and Imbens and Newey, 2009)1^ 

Our paper is also related to the literature that evaluates policy effects and treatment 
effects using propensity score methods. The influential article of DiNardo, Fortin, and 
Lemieux (1996) developed estimators for counterfactual densities using propensity score 
reweighting in the spirit of Horvitz and Thompson (1952). Important related work by 
Hirano, Imbens, and Ridder (2003) and Firpo (2007) used a similar reweighting approach 
in exogenous treatment effects models to construct efficient estimators of average and 
quantile treatment effects, respectively. As we comment later in the paper, it is possible 
to adapt the reweighting methods of these articles to develop policy estimators and limit 
theory for such estimators. Here, however, we focus on developing inferential theory for 
policy estimators based on regression methods, thus supporting empirical research using 
regression techniques as its primary method (Buchinsky, 1994, Chamberlain, 1994, Han 
and Hausman, 1990, Machado and Mata, 2005). The recent book of Angrist and Pischke 
(2008, Chap. 3) provides a nice comparative discussion of regression and propensity score 
methods. Finally, a related work by Firpo, Fortin, and Lemieux (2007) studied the effects 
of special policy interventions consisting of marginal changes in the values of the covari- 
ates. As we comment later in the paper, their approach, based on a linearization of the 
functional of interest, is quite different from ours. In particular, our approach focuses 
on more general non-marginal changes in both the marginal distribution of covariates and 
conditional distribution of the outcome given covariates. 

""^We focus on semi-parametric estimators due to their dominant role in empirical work (Angrist and 
Pischke, 2008). In contrast, fully nonparametric estimators are practical only in situations with a small 
number of regressors. In future work, however, we hope to extend the analysis to nonparametric estimators. 

^ In this case, the literature provides estimators for Fy^ , the distribution of potential outcome Y under 
treatment d, and Fu,z, the joint distributions of (endogenously determined) treatment status D and 
exogenous regressors Z before and after policy. As long as the estimator of Fy-^ satisfies the functional 
central limit theorem specified in the main text and the estimator of Fjj^z satisfies the functional central 
limit theorem specified in Appendix D, our inferential theory applies to the resulting policy estimators. 
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Wc illustrate our estimation and inference procedures with an analysis of the evolution of 
the U.S. wage distribution. Our analysis is motivated by the influential article by DiNardo, 
Fortin, and Lemieux (1996), which studied the institutional and labor market determinants 
of the changes in the wage distribution between 1979 and 1988 using data from the CPS. 
We complement and complete their analysis by using a wider range of techniques, including 
quantile regression and distribution regression, providing standard errors for the estimates 
of the main effects, and extending the analysis to the entire distribution using simultaneous 
confidence bands. Our results reinforce the importance of the decline in the real minimum 
wage in explaining the increase in wage inequality. They also indicate the importance of 
changes in both the composition of the workforce and the returns to worker characteristics 
in explaining the evolution of the entire wage distribution. Our results show that, after 
controlling for other composition effects, the process of de-unionization during the 80s 
played a minor role in explaining the evolution of the wage distribution. 

We organize the rest of the paper as follows. In Section 2 we describe methods for 
performing counterfactual analysis, setting up the modeling assumptions for the counter- 
factual outcomes, and introduce the policy estimators. In Section 3 we derive distributional 
results and inferential procedures for the policy estimators. In Section 4 we present the 
empirical application, and in Section 5 we give a summary of the main results. In the 
Appendix, we include proofs and additional theoretical results. 

2. Methods for Counterfactual Analysis 

2.1. Observed and counterfactual outcomes. In our analysis it is important to distin- 
guish between observed and counterfactual outcomes. Observed outcomes come from the 
population before the policy intervention, whereas (unobserved) counterfactual outcomes 
come from the population after the potential policy intervention. We use the observed 
outcomes and covariates to establish the relationship between outcome and covariates and 
the distribution of the covariates, which, together with either a postulated distribution of 
the covariates under the policy or a postulated conditional distribution of outcomes given 
covariates under the policy, determine the distribution of the outcome after the policy 
intervention, under conditions precisely stated below. 

We divide our population in two groups or subpopulations indexed by j e {0, 1}. Index 
corresponds to the status quo or reference group, whereas index 1 corresponds to the 
group from which we obtain the marginal distribution of X or the conditional distribution 
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of Y given X to generate the counterfactual outcome distribution]^ In order to discuss 
various regression models of outcomes given covariates, it is convenient to consider the 
following representation. Let Qyj{u\x) be the conditional w-quantile of Y given X in 
group j, and let Fx^. be the marginal distribution of the p-vector of covariates X in group 
k, for j, k G {0, 1}. We can describe the observed outcome Yj in group j as a function of 
covariates and a non-additive disturbance Uj via the Skorohod representation: 

Y^^ = QY^{Ui\Xj), where f/| ~ U{0, 1) independently of Xj ~ Fx,, for j e {0, 1}. 

Here the conditional quantile function plays the role of a link function. More generally 
we can think of Qr i structural or causal function mapping the covariates and 

the disturbance to the outcome, where the covariate vector can include control variables 
to account for endogeneity. In the classical regression model, the disturbance is separable 
from the covariates, as in the location shift model described below, but generally it need 
not be. Our analysis will cover either case. 

We consider two different counterfactual experiments. The first experiment consists 
of drawing the vector of covariates from the distribution of covariates in group 1, i.e., 
Xi ~ Fxi, while keeping the conditional quantile function as in group 0, Qyo{u\x). The 
counterfactual outcome Yq is therefore generated by 

y^i := Qy,{U^\Xi), where ~ [7(0, 1) independently of Xi ~ Fx,. (2.1) 

This construction assumes that we can evaluate the quantile function Qy^{u\x) at each 
point X in the support of Xi. This requires that either the support of Xi is a subset of 
the support of Xq or we can extrapolate the quantile function outside the support of Xq. 

For purposes of analysis, it is useful to distinguish two different ways of constructing 
the alternative distributions of the covariates. (1) The covariates before and after the 
policy arise from two different populations or subpopulations. These populations might 
correspond to different demographic groups, time periods, or geographic locations. Spe- 
cific examples include the distributions of worker characteristics in different years and 
distributions of socioeconomic characteristics for black versus white mothers. (2) The 
covariates under the policy intervention arise as some known transformation of the covari- 
ates in group 0; that is Xi = g{Xo), where g{-) is a known function. This case covers, for 

■^Our results also cover the policy intervention of changing both the marginal distribution of X and 
the conditional distribution of Y given X. In this case the counterfactual outcome corresponds to the 
observed outcome in group 1. 
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example, unitary changes in the location of one of the covariates, 

Xi = Xq + Cj , 

where ej is a unitary p- vector with a one in the position j; or mean preserving redistribu- 
tions of the covariates implemented as Xi = (1 — a;)-E[Xo] + aX^. These types of policies 
are useful for estimating the effect of smoking on the marginal distribution of infant birth 
weights, the effect of a change in taxation on the marginal distribution of food expendi- 
ture, or the effect of cleaning up a local hazardous waste site on the marginal distribution 
of housing prices (Stock, 1991). Even though these two cases correspond to conceptually 
different thought experiments, our econometric analysis will cover either situation within 
a unified framework. 

The second experiment consists of generating the outcome from the conditional quantile 
function in group 1, Qyi{u\x), while keeping the marginal distributions of the covariates 
as in group 0, that is, Xq ~ Fx^. The counterfactual outcome is therefore generated by 

yo Q^^([/0|Xo), where ~ f/(0, 1) independently of Xq ~ Fx,. (2.2) 

This construction assumes that we can evaluate the quantile function Qyi{u\x) at each 
point X in the support of Xq. This requires that either the support of Xq is a subset of 
the support of Xi or we can extrapolate the quantile function outside the support of Xi. 

In this second experiment, the conditional quantile functions before and after the policy 
intervention may arise from two different populations or subpopulations. These popu- 
lations might correspond to different demographic groups, time periods, or geographic 
locations. This type of policy is useful for conceptualizing, for example, what the distri- 
bution of wages for female workers would be if they were paid as male workers with the 
same characteristics, or similarly for blacks or other minority groups. 

We formally state the assumptions mentioned above as follows: 

Condition M. Counterfactual outcome variables of interest are generated by either 
^2. 1\) or Ii2.2^) . The conditional distributions of the outcome given the covariates in both 
groups, namely the conditional quantile functions QYj{-\-) or the conditional distribution 
functions Fy-{-\-) for j G {0, 1}, apply or can be extrapolated to all x E X, where X is a 
compact subset of W that contains the supports of Xq and Xi . 

2.2. Parameters of interest. The primary (function-valued) parameters of interest are 
the distribution and quantile functions of the outcome before and after the policy as well 
as functionals derived from them. 



In order to define these parameters, we first recall that the conditional distribution 
associated with the quantile function Qy {u\x) is: 



FyM^)= f 1 {Qy^{u\x) < y} du, je{0,l}. 
Jo 



(2.3) 



Given our definitions (12.11) or (12.21) of the counterfactual outcome, the marginal distribu- 
tions of interest are: 

F^^iy):=Fi{Y;'<y}= j Fy^ (|/|x)rfFx,(a:), j, G {0, 1} (2.4) 

The corresponding marginal quantile functions are: 

Q%{u) = M{y : F^^iy) > u}, j, k E {0, 1}. 

The M-quantile policy effect and the y-distribution policy effect are: 

QE'y^iu) = Q'y^iu) - Q%{u) and DE'y^{y) = F^^{y) - F^y), j, k G {0, 1}. 

It is useful to mention a couple of examples to understand the notation. For instance, 
Qyo ~ quantile effect under a policy that changes the marginal distribution 

of covariates from Fxq to Fx^, fixing the conditional distribution of outcome to Fy^{y\x). 
On the other hand, Qy^{u) — Qy^iu) is the quantile effect under a policy that changes 
the conditional distribution of the outcome from Fyf^{y\x) to Fy-^{y\x), fixing the marginal 
distribution of covariates to Fxq. 

Other parameters of interest include, for example, Lorenz curves of the observed and 
counterfactual outcomes. Lorenz curves, commonly used to measure inequality, are ratios 
of partial means to overall means 



ry 

L{y,F^)= / tdF^^it)/ / tdF^^.(t), 
Jo Jo 



defined for non-negative outcomes only. More generally, we might be interested in arbitrary 
functionals of the marginal distributions of the outcome before and after the interventions 

Hy{y) := (y, F^, F^^, F^^, F^J . (2.5) 

These functionals include the previous examples as special cases as well as other examples 
such as means, with Hy{y) = J^^tdFy.it) =: /iy,; mean policy effects, with Hy{y) = 
fly. —fJ'yg', variances, with Hy{y) = t^dFy. (t) — (/iy,)^ =: (cy,)^; variance policy effects, 
with Hy{y) = (4J2 _ (^oj2. Lorenz policy effects, with Hy{y) = L{y, F^) - L{y, F^) =: 
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LEy.iy)', Gini coefficients, with Hyiy) = 1 — 2 f^L{FY.,y)dy =: Gy^, and Gini pohcy 
effects, with Hy{y) = G\^ - G% =: GE^^S 

In the case where the pohcy consists of either a known transformation of the covariates, 
Xi = g{Xo), or a change in the conditional distribution of Y given X, we can also identify 
the distribution and quantile functions for the effect of the policy. A*-' = Y^ — Y^, by: 

FLiS)= [ [ 1 {QA,iu\x)< 6} dudFx,{x), J, ke {0,1}, (2.6) 
Jx Jo 

where Qao{u\x) = QYo{u\g{x)) — Qyo{u\x) and Qai{u\x) = Qyi{u\x) — QYg{u\x); and 

Qi^iu) = mi{5 : Fi^iS) > u}, j, k E {0, 1}, (2.7) 

under the additional assumption (Heckman, Smith, and Glements, 1997): 

Condition RP. Conditional rank preservation: Uq = Uq\Xq and = Uq\Xq. 

2.3. Conditional models. The preceding analysis shows that the marginal distribution 
and quantile functions of interest depend on either the underlying conditional quantile 
function or conditional distribution function. Thus, we can proceed by modeling and esti- 
mating either of these conditional functions. We can rely on several principal approaches 
to carrying out these tasks. In this section we drop the dependence on the group index to 
simplify the notation. 

Example 1. Classical regression and generalizations. Classical regression is one 
of the principal approaches to modeling and estimating conditional quantile functions. 
The classical location-shift model takes the form 

Y = m{X) + V, V = Qv{U), (2.8) 

where U ~ U{0, 1) is independent of X, and m(-) is a location function such as the 
conditional mean. The disturbance V has the quantile function Qv{u), and Y therefore has 
conditional quantile function Qy{u\x) = m(x) + (5v(w). This model is parsimonious in that 
covariates impact the outcome only through the location. Even though this is a location 
model, it is clear that a general change in the distribution of covariates or the conditional 
quantile function can have heterogeneous effects on the entire marginal distribution of Y, 
affecting its various quantiles in a differential manner. The most common model for the 



^In the rest of the discussion we keep the distribution, quantile, quantile policy effects, and distribution 
policy effects functions as separate cases to emphasize the importance of these functionals in practice. 
Lorenz curves are special cases of the general functional with Hy (y) — tdFy. (t) / tdFy. (<) , and 
will not be considered separately. 
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regression function m{x) is linear in parameters, m{x) = x'(3, and we can estimate it using 
least squares or instrumental variable methods. We can leave the quantile function Qv{u) 
unrestricted and estimate it using the empirical quantile function of the residuals. Our 
results cover such common estimation schemes as special cases, since we only require the 
estimates to satisfy a functional central limit theorem. 

The location model has played a classical role in regression analysis. Many endogenous 
and exogenous treatment effects models, for example, can be analyzed and estimated 
using variations of this model (Cameron and Trivedi, 2005 Chap. 25, and Imbens and 
Wooldridge, 2008). A variety of standard survival and duration models also imply (12.81) 
after a transformation such as the Cox model with WeibuU hazard or accelerated failure 
time model, cf. Docksum and Gasko (1990). 

The location-scale shift model is a generalization that enables the covariates to impact 
the conditional distribution through the scale function as well: 

Y = m(X) + a{X)-V, V = QviU), 

where U ~ U{0, 1) independently of X, and a{-) is a positive scale function. In this model 
the conditional quantile function takes the form Qy{u\x) = m{x) + a{x)Qv{u). It is clear 
that changes in the distribution of X or in Qy{u\x) can have a nontrivial effect on the 
entire marginal distribution of Y, affecting its various quantiles in a differential manner. 
This model can be estimated through a variety of means (see, e.g., Rutemiller and Bowers, 
1968, and Koenker and Xiao, 2002). 

Example 2. Quantile regression. We can also rely on quantile regression as a 
principal approach to modeling and estimating conditional quantile functions. In this 
approach, we have the general non-separable representation 

Y = Qy{U\X). 

The model permits covariates to impact the outcome by changing not only the location 
and scale of the distribution but also its entire shape. An early convincing example of such 
effects goes back to Doksum (1974), who showed that real data can be sharply inconsistent 
with the location-scale shift paradigm. Quantile regression precisely addresses this issue. 
The leading approach to quantile regression entails approximating the conditional quantile 
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function by a linear form Qy{u\x) = x'f3{u)% Koenker (2005) provides an excellent review 
of this method. 

Quantile regression allows researchers to fit parsimonious models to the entire condi- 
tional distribution. It has become an increasingly important empirical tool in applied 
economics. In labor economics, for example, quantile regression has been widely used to 
model changes in the wage distribution (Buchinsky, 1994, Chamberlain, 1994, Abadie, 
1997, Gosling, Machin, and Meghir, 2000, Machado and Mata, 2005, Angrist, Cher- 
nozhukov, and Fernandez- Val, 2006, and Autor, Katz, and Kearney, 2006b). Variations 
of quantile regression can be used to obtain quantile and distribution treatment effects in 
endogenous and exogenous treatment effects models (Abadie, Angrist, and Imbens, 2002, 
Chernozhukov and Hansen, 2005, and Firpo, 2007). 

Example 3. Duration regression. A common way to model conditional distribution 
functions in duration and survival analysis is through the transformation model: 

Fyiylx) = exp(exp(m(a;) + t{y))), (2.9) 

where t{-) is a monotonic transformation. This model is rather rich, yet the role of co- 
variates is limited in an important way. In particular, the model leads to the following 
location-shift representation: 

tiY) = m(X) + V, 

where V has an extreme value distribution and is independent of X. Therefore, covariates 
impact a monotone transformation of the outcome only through the location function. The 
estimation of this model is the subject of a large and important literature (e.g., Lancaster, 
1990, Donald, Green, and Paarsch, 2000, and Dabrowska, 2005). 

Example 4. Distribution regression. Instead of restricting attention to transfor- 
mation models for the conditional distribution, we can consider directly modeling Fy(?/|x) 
separately for each threshold y. An example is the model 

Fviylx) = A{m{y,x)), 

where A is a known link function and m{y, x) is unrestricted in y. This specification 
includes the previous example as a special case (put A(t>) = exp(exp(t')) and m{y,x) = 
m(x) + t{y)) and allows for more flexible effect of the covariates. The leading example of 

^Throughout, by "hnear" we mean specifications that are hnear in the parameters but could be highly 
non-linear in the original covariates; that is, if the original covariate is X, then the conditional quantile 
function takes the form z' f3{u) where z = f{x). 
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this specification would be a probit or logit link function A and m{y,x) = x'f3{y), were 
f3{y) is an unknown function in y (Han and Hausman, 1990, and Foresi and Peracchi, 
1995). This approach is similar in spirit to quantile regression. In particular, as quantile 
regression, this approach leads to the specification Y = Qy{U\X) = m~^(A~^(f/), X) 
where U ~ U{0, 1) independently of X. 

2.4. Policy estimators and inference questions. All of the preceding approaches gen- 
erate estimates Fy^. (?/|x), j G {0, 1}, of the conditional distribution functions either directly 
or indirectly using the relation fl2.3p : 

Fy^{y\x) = ^ 1 [Qy^{u\x) < y} du, J G {0, 1}, (2.10) 

where Qy^iulx) is a given estimate of the conditional quantile function. 

We then estimate the marginal distribution functions and quantile functions for the 
outcome by 

F^^iy) = J Fy^{y\x)dFxSx), and Q^^.(«) = mi{y : F^^{y) > n}, 

respectively, for j,k G {0, 1}. We estimate the quantile and distribution policy effects by 

QEy^iu) = Q'y^iu) - g°,(w), and DE'y^iy) = F^^{y) - F^iy). 

We estimate the general functionals introduced in (12. 5p similarly, using the plug-in rule: 

Hy{y) = (y, FO,, F^^, F° ) . (2.11) 

For example, in this way we can construct estimates of the distribution and quantiles of 
the effects defined in (El]) and ^01) . 

Common inference questions that arise in policy analysis involve features of the dis- 
tribution of the outcome before and after the intervention. For example, we might be 
interested in the average effect of the policy, or in quantile policy effects at several quan- 
tiles to measure the impact of the policy on different parts of the outcome distribution. 
More generally, in this analysis many questions of interest involve the entire distribution 
or quantile functions of the outcome. Examples include the hypotheses that the policy 
has no effect, that the effect is constant, or that it is positive for the entire distribution 
(McFadden, 1989, Barrett and Donald, 2003, Koenker and Xiao, 2002, Linton, Maasoumi, 
and Whang, 2005). The statistical problem is to account for the sampling variability in 
the estimation of the conditional model to make inference on the functionals of interests. 
Section 3 provides limit distribution theory for the policy estimators. This theory applies 
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to the entire marginal distribution and quantile functions of the outcome before and after 
the pohcy, and therefore is vahd for performing either uniform inference about the en- 
tire distribution function, quantile function, or other functionals of interest, or pointwise 
inference about values of these functions at a specific point. 

2.5. Alternative approaches. An alternative way to proceed with policy analysis is to 
use reweighting methods (DiNardo, 2002). Indeed, under Condition M, we can express 
the marginal distribution of the counterfactual outcome in (12. 4p as 

4(2/)= / / l{l7<yK(a;)dFy^(y|x)rfFx,(x), j,fce{0,l}, (2.12) 
Jx Jy 

where w'y{x) = fx^i^)/ fx.ix) = (1 - Pj)pj{x)/[pj{l - Pj{x))], Pj{x) := Pr{J = j\X = x} 
is the propensity score, pj = Pr{ J = j}, J is an indicator for group j, fxj is the density 
of the covariate given J = j, and 3^ is the support of Y. The second form of the weighting 
function Wj follows from Bayes' rule. We can use the expression (12.121) along with either 
density or propensity score weighting to construct policy estimators. Firpo (2007) used 
a similar propensity score reweighting approach to derive efficient estimators of quantile 
effects in treatment effect modelsjj With some work, one can adapt the nice results of Firpo 
(2007) to obtain the results needed to perform pointwise inference, namely, inference on 
quantile policy effects at a specific point. However, we need to do more work to develop the 
results needed to perform uniform inference on the entire quantile or distribution function. 
We are carrying out such work in a companion paper. 

In a recent important development, Firpo, Fortin, and Lemieux (2007) propose an al- 
ternative useful procedure to estimate policy effects of changes in the distribution of X. 
Given a functional of interest 0, they use a first order approximation of the policy effect: 

where 4>'{Fy^ — Fy^^) = f a{y, FY^)d{FY^^{y) — Fy^(y)) is the first order linear approximation 
term, where function a is the infiuence or the score function, and i?(Fy^,Fy^) is the re- 
maining approximation error. In the context of our problem, this approximation error is 
generally not equal to zero and does not vanish with the sample size. Firpo, Fortin, and 
Lemieux (2007) propose a practical mean regression method to estimate the first order 
term (p'^Fy^ — Fy^); this method cleverly exploits the law of iterated expectations and the 



^See Angrist and Pischke (2008) for a detailed review of propensity score methods and a comparison 
to regression methods in the context of treatment effect models. The pros and cons of these two methods 
are also likely to apply to policy analysis. In this paper we focus on the regression method. 
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linearity of the term in the distributions. In contrast to our approach, the estimand of 
this method is an approximation to the pohcy effect with a non-vanishing approximation 
error, whereas we directly estimate the exact effect (piFy^) —(piFy^) without approximation 
error. 

3. Limit Distribution and Inference Theory for Policy Estimators 

In this section we provide a set of simple, general sufficient conditions that facilitate 
inference in large samples. We design the conditions to cover the principal practical ap- 
proaches and to help us think about what is needed for various approaches to work. Even 
though the conditions are reasonably general, they do not exhaust all scenarios under 
which the main inferential methods will be valid. 

3.1. Conditions on estimators of the conditional distribution and quantile func- 
tions. We provide general assumptions about the estimators of the conditional quantile 
or distribution function, which allow us to derive the limit distribution for the policy es- 
timators constructed from them. These assumptions hold for commonly used parametric 
and semiparametric estimators of conditional distribution and quantile functions, such as 
classical, quantile, duration, and distribution regressions. 

We begin the analysis by stating regularity conditions for estimators of conditional 
quantile functions, such as classical or quantile regression. In the sequel, let £°°((0, 1) x X) 
denote the space of bounded functions mapping from (0, 1) x X to M, equipped with 
the uniform metric. We assume we have a sample {{Xi,Yi),i = l,....,n} of size n for 
the outcome and covariatcs before the policy intervention. In this sample Uq = u/Xq 
observations come from group and rii = n/Xi observations come from group 1. In what 
follows we use =^ to denote weak convergence. 

Condition C. The conditional density fYj{y\x) of the outcome given covariates exists, 
and is continuous and bounded above and away from zero, uniformly ony & y and x & X, 
where y is a compact subset ofR, for j e {0, 1}. 

Condition Q. The estimators {u,x) i— > QYj{u\x) of the conditional quantile functions 
{u,x) I— > QYj{u\x) of outcome given covariates jointly converge in law to continuous Gauss- 
ian processes: 

(Qy,{u\x) - QyM'')) ^ ^AjVj{u,x), j e {0, 1} (3.1) 
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in £°°((0, 1) X X), where {u,x) h-^ Vj{u,x),j G {0,1}, have zero mean and covariance 
function Ev^^(ii, x, u, x) :— E[Vj{u, x)Vr{u, x)], for j, r e {0, 1}. 

These conditions appear reasonable in practice when the outcome is continuous. If the 
outcome is discrete, the conditions C and Q do not hold. However, in this case we can use 
the distribution approach discussed below. Condition C and Q focus on the case where 
the outcome has a compact support with a density bounded away from zero, which is 
a reasonable first case to analyze in detail. Condition Q applies to the most common 
estimators of conditional quantile functions under suitable regularity conditions (Doss and 
Gill, 1992, Gutenbrunner and Jureckova, 1992, Angrist, Chernozhukov, and Fernandez- Val, 
2006, and Appendix F). Conditions C and Q could be extended to include other cases, 
without affecting subsequent results. For instance, given set y in Condition C over which 
we want to estimate the counterfactual distribution. Condition Q needs only to hold over 
a smaller region UX — {{u,x) e (0, 1) x A" : Qy{u\x) E y} C (0, 1) x X, which leads to 
a less restrictive convergence requirement, without affecting any subsequent results. The 
joint convergence holds trivially if the samples for each group are mutually independent. 

We next state regularity conditions for estimators of conditional distribution functions, 
such as duration or distribution regressions. Let i°°{y x X) denote the space of bounded 
functions mapping from 3^ x A' to M, equipped with the uniform metric, where is a 
compact subset of M. 

Condition D. The estimators {y,x) t-^ Fy^iylx) of the conditional distribution func- 
tions {y,x) ^^ FY^{y\x) of the outcome given covariates converges in law to a continuous 
Gaussian processes: 

V^(^FyM^) - FyM^)) =^ y/XjZj{y,x), je{0,l}, (3.2) 

in i°°{y X X), where {y, x) ^^ Zj{y, x), j G {0, 1}, have zero mean and covariance function 
^Zjriy, X, y, x) := E[Zj{y, x)Zr{y, x)], for j, r G {0, 1}. 

This condition holds for common estimators of conditional distribution functions (Beran, 
1977, Burr and Doss, 1993, and Appendix F). These estimators, however, might produce 
estimates that are not monotonic in the level of the outcome y (Foresi and Peracchi, 1995, 
and Hall, Wolff, and Yao, 1999). A way to avoid this problem and to improve the finite 
sample properties of the conditional distribution estimators is by rearranging the estimates 
(Chernozhukov, Fernandez- Val, and Galichon, 2006). The joint convergence holds trivially 
if the samples for each group are mutually independent. 
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If we start from a conditional quantile estimator Qyj{u\x), we can define the conditional 
distribution function estimator Fy^(?/|x) using the relation fl2.10p . It turns out that if 
the original quantile estimator satisfies conditions C and Q, then the resulting conditional 
distribution estimator satisfies condition D. This result allows us to give a unified treatment 
of the policy estimators based on either quantile or distribution estimators. 

Lemma 1. Under conditions C and Q, the estimators of the conditional distribution func- 
tion defined by Ii2.10\) satisfy the condition D with 

Z,iy,x) = -/y^(y|x)r,(FK,,(i/|x),x), J G {0,1}. 

3.2. Examples of Conditional Estimators. Here we verify that the principal estima- 
tors of conditional distribution and quantile functions satisfy the functional central limit 
theorem, which we required to hold in our main Conditions D and Q. In this section we 
drop the dependence on the group index to simplify the notation. 

Example 1 continued. Classical regression. Consider the classical linear regression 
model Y = X'Po + V, where the disturbance V is independent of X and has mean zero, 
finite variance and quantile function q;o(m). In this case, we can estimate /?o by mean 
regression and quantiles of V by the empirical quantile function of the residuals. We 
show in Appendix F that the resulting estimator 6{u) = {a{u),(3'y of Oq{u) = (ao(^))/5o)' 
obeys a functional central limit theorem y/n{6{u) — 6o{u)) =^ Go{u)^^Z[u), where Z is a 
zero mean Gaussian process with covariance function Q{u, u) specified in flF.6l) and matrix 
Gq{u) := G(a;o(M), /9o, u)' specified in (IF. 511 . The resulting estimator, Qy{u\x) = a{u)+x'(3, 
of the conditional quantile function Qy{u\x) obeys a functional central limit theorem, 

[Qviylx) - ^ (1, x')Goiu)-'Z{u) =: x), 

in £°°((0, 1) X X), where V{u, x) is a zero mean Gaussian process with covariance function, 

= {l,x')Go{u)~^n{u,u)[Go{uy^]'{l,x'y. 

Example 2 continued. Quantile regression. Consider a linear quantile regression 
model where Qy{u\x) = x' (3q{u). In Appendix F we show the canonical quantile regression 
estimator satisfies a functional central limit theorem, y/n{[3{u) — [3q{u)) =^ Go{u)~^Z{u), 
where Z{u) is a zero mean Gaussian process with covariance function Q{u, u) = {min(M, u) — 
u ■ u}E[XX'] and Go{u) := G{(3o{u),u) = -E[fY{X' (3o{u)\X)XX']. The estimator of the 
conditional quantile function also obeys a functional central limit theorem, 

s/n (Qy{u\x) — Qy{u\x)) = \fn (x'(3{u) — x'(3o{u)] =^ x'Go{u)~'^Z{u) := V{u,x), 
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in £°°((0, 1) X X), where V{u, x) is a zero mean Gaussian process with covariance function 
given by: 

Ey(M, X, u, x) = x'Go(u)~^Q{u, u)Go{u)~^x. 

Example 3 continued. Duration regression. Consider the transformation model 
for the conditional distribution function stated in equation fl2.9p . A common duration 
model that gives rise to this specification is the proportional hazard model of Cox (1972), 
where the conditional hazard rate of an individual with covariate vector x is Ay(?/|x) = 
Xo{y) exp{x' Po) , Po is a p-vector of regression coefficients, Aq is the nonnegative base- 
line hazard rate function, and y & y = [0,y] for some maximum duration y. Let 
Ao(y) = Jq XQ{y)dy denote the integrated baseline hazard function. Then Fy(?/|a;) = 1 — 
exp{— exp(x'/5o+ln Ao(y))}, delivering the transformation model fl2.9l) with t{y) = In Ao(?/) 
and m(x) = x'Pq. 

In order to discuss estimation, let us assume i.i.d. sampling of (Yi, Xi) without censoring. 
Then Cox's (1972) partial maximum likelihood estimator of f3o takes the form 

/n n 
^ log I Ji(y) exp(a;-/3)/ ^ Jj{y) exp(x^./3)|diVi(?/), 
i=l j=l 

and the Breslow-Nelson-Aalen estimator of Aq takes the form 

j=l i=l 

where Ni{y) := l{Yi < y} and Ji{y) := l{Yi >y}, ye 3^; see Breslow (1972,1974). 

Let W denote a standard Brownian motion on 3^ and let Z denote an independent 
p-dimensional standard normal vector. Andersen and Gill (1982) show that 

V^0-Po, Hy) - My)) (jr^'^Z, W{a[y)) - h{y)'Y.-^I^Z) 

in X i'^{y\ with the terms a(y), 6(y), and S, and regularity conditions defined in 
Andersen and Gill (1982) and Burr and Doss (1993). Let -Fy(t/|x) = 1 — exp{— exp(x'/5 + 
log A(y))} be the estimator of Fy(?/|a;). Since Fy(?/|x) is Hadamard-differentiable in (/3, A), 
by the functional delta method we have the functional central limit theorem 

^ (Fy{y\x) - Fy{y\x)) {1-Fy(y|x)} {exp{x'Po)W{a{y)) + b{y, xY^-'/'Z} =: Z{y, x), 

in £°°{y X X), where b{y,x) = Ay(t/|x)x — exp{x' (3o)b{y) , and Z{y,x) is a zero mean 
Gaussian process with covariance function, for y < y, 

^ziy,x,y,x) = {l-Fy(?/|x)}{l-Fy(^|x)} {exp{x'(3o)exp{x(3o)a{y) + b{y, x)'i:~%{y, x)} . 
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In Appendix F we also discuss another estimator of this model. 

Example 4 continued. Distribution regression. Consider the model Fy(y|x) = 
A(x'/5o(?/)) for the conditional distribution function, where A is a known link function, 
such as the logistic or normal distribution. We can estimate the function (3o{y) by applying 
maximum likelihood to the indicator variables 1{Y < y} for each value ofy & y separately. 
In Appendix F, we prove that the resulting estimator (3{y) of (3o{y) obeys a functional 
central limit theorem 

{m - Po{y)) -Goiy)-'Ziy), 

where Goiy) := G(/3o(y),y) = E[X[X'/3oiy)]'XX' /{A[X'Poiym - m'Hy)])}l A is the 
derivative of A, and Z{y) is a zero mean Gaussian process with covariance function 

n{y,y) = E [XX'A[X'/3o(y)]A[X'/3o(y)]/{A[X'/5o(2/)](l - A[X'/3o(y)])}] , 

for y > y. Hence the resulting estimator Fyiylx) := A{x'f3{y)) of the conditional distribu- 
tion function also obeys the functional central limit theorem, 

v^(Fy(y|x) -\[x'Po{y)]x'Go{y)-'Z{y) =: Z{y,x), 

in i°°{y X X), where Z{y,x) is a zero mean Gaussian process with covariance function: 

^ziy,x,y,x) = \[x'Po{yMx'My)WGo{y)-'n{y,y)Go{y)-'x. 

3.3. Basic principles underlying the limit theory. The derivation of the limit theory 
for policy estimators relies on several basic principles that allow us to link the properties 
of the estimators of conditional (quantile and distribution) functions with the properties of 
estimators of marginal functions. First, although there does not exist a direct connection 
between conditional and marginal quantiles, we can always switch from conditional quan- 
tiles to conditional distributions using Lemma [1], then use the law of iterated expectations 
to go from conditional distribution to marginal distribution, and finally get to marginal 
quantiles by inverting. Second, as the functionals of interest depend on the entire condi- 
tional function, we must rely on the functional delta method to obtain the limit theory for 
these functionals as well as to obtain intermediate limit results such as Lemma [H Since the 
estimated conditional distributions and quantile functions are usually non-monotone and 
discontinuous in finite samples, we must use refined forms of the functional delta method. 

Accordingly, the key ingredient in the derivation and one of the main theoretical con- 
tributions of the paper is the demonstration of the Hadamard differentiability of the func- 
tionals of interest with respect to the limit of the conditional processes, tangentially to the 
subspace of continuous functions. Indeed, we need this refined form of differentiability to 
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deal with our conditional processes, which typically are discontinuous random functions in 
finite samples yet converge to continuous random functions in large samples. These refined 
differentiability results in turn enable us to use the functional delta method to derive all 
of the following limit distribution and inference theory. 

3.4. Limit theory for counterfactual distribution and quantile functions. Our 

first main result shows that the estimators of the marginal distribution and quantile func- 
tions before and after the policy intervention satisfy a functional central limit theorem. 

Theorem 1 (Limit distribution for marginal distribution functions). Under Conditions M 
and D, the estimators Fy. (y) of the marginal distribution functions Fy. (y) jointly converge 
in law to the following Gaussian processes: 

^(F^^{y)-F^^{y)) ^ ^ / Z,{y,x)dFx,{x) =: ^jZ^{y). J,k e {0,1}, (3.3) 

in i°°{y), where y i— > Zj{y), j e {0,1}, have zero mean and covariance function, for 
j, k,r,se {0, 1}, 

j:%^{y,y):=E[Z^{y)Z:{y)]= [ [ J:z^M^:y:^)dFx,{x)dFxM- (3-4) 

Jx Jx 

Theorem 2 (Limit distribution for marginal quantile functions). Under Conditions M, 
C, and D the estimators Qy. {u) of the marginal quantile functions Qy. (u) jointly converge 
in law to the following Gaussian processes: 

{Qy,{u) - Q'y^iu)) -Z';{Ql.^{u))/f%{Q'y^{u)) =: V^{u), j,k E {0,1}, (3.5) 

in £°°((0, 1)), where fy^iy) = f^ fYj{y\x)dFx^^{x), and u i-^ Vj'lu), j, k e {0, 1}, have zero 
mean and covariance function, for j, k,r,s e {0, 1}, 

Our second main result shows that the estimators of the marginal quantile and distri- 
bution policy effects also satisfy a functional central limit theorem. 

Corollary 1 (Limit distribution for quantile pohcy effects). Under Conditions M, C, and 
D the estimators of the quantile policy effects converge in law to the following Gaussian 
processes: 

^ iOEy.iu) - QE'^.iu)) ^ y/^V^{u) - ^^V^{u) =: W^{u), k,j e {0, 1}, (3.6) 

in the space £°°{{0, 1)), where the processes u i— > W^{u), j, k e {0, 1}, have zero mean and 
covariance function Ti^.^{u, u) :— E[Wj{u)W^{u)], for j, k,r,s E {0, 1}. 
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Corollary 2 (Limit distribution for distribution policy effects). Under Conditions M and 
D the estimators of the distribution policy effects converge in law to the following Gaussian 
processes: 

^ {DE\{y) - DE'y^{y)) ^,Z]{y) - ^^Z^^y) =: S]{y), j, k G {0, 1}, (3.7) 

in the space i'^iy^ where the processes y ^— > Sj{y), j,k G {0,1}, have zero mean and 
variance function S|*^(?/, y) := E[Sj{y)S^{y)], for j, k,r,s E {0, 1}. 

Our third main result shows that various functionals of the status quo and counterfactual 
marginal distribution and quantile functions satisfy a functional central limit theorem. 

Corollary 3 (Limit distribution for differentiable functionals). Let Hyiy) = 
(j)[y, Fy^, Fy^, Fy^, FyJ , o functional taking values in i°°{y), be Hadamard differentiable 
in [Fy^, Fy_^, Fy^, Fy_^) tangentially to the subspace of continuous functions with derivative 
(0QQ, 0']^;^, 0Q]^, 0'^q). Then under Conditions M and D the plug-in estimator Hy(y) defined 
in 1^2. converges in law to the following Gaussian process: 

^(Hy{y)-Hy{y))^ J2 v^0;,(t/, F",, F^,, F^jZ^d/) =: T^d/), (3.8) 

j,ke{o,i} 

in i°°{y), where y t— > Tuiy) has zero mean and covariance function T,T„{y,y) '■ = 
E[TH{y)THm- 

Examples of functionals covered by Corollary [3] include function-valued parameters, 
such as Lorenz curves and Lorenz policy effects, as well as scalar-valued parameters, such 
as Gini coefficients and Gini policy effects (Barrett and Donald, 2009). These examples 
also include quantile and distribution functions of the effect of the policy defined under 
Condition RP; in Appendix C we state the results for these effects separately in order to 
give them some emphasis. 

3.5. Uniform inference and resampling methods. We can readily apply the preced- 
ing limit distribution results to perform inference on the distributions and quantiles of the 
outcome before and after the policy at a specific point. For example. Corollary 1 implies 
that the quantile policy effect estimator for a given quantile u is asymptotically normal 
with mean QEy,{u) and variance Il^..{u,u)/n. We can therefore perform inference on 
QEy.{u) for a particular quantile index u using this normal distribution and replacing 
S^, , (m, u) by a consistent estimate. 
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However, pointwise inference permits looking at the effect of the pohcy at a specific 
point only. This approach might be restrictive for policy analysis where the quantities and 
hypotheses of interest usually involve many points or a continuum of points. That is, the 
entire distribution or quantile function of the observed and counterfactual outcomes is often 
of interest. For example, in order to test hypotheses of the policy having no effect on the 
distribution, having a constant effect throughout the distribution, or having a first order 
dominance effect, we must use the entire outcome distribution, and not only a single specific 
point. Moreover, simultaneous inference corrections to pointwise procedures based on the 
normal distribution, such as Bonferroni-type corrections, can be very conservative for 
simultaneous testing of highly dependent hypotheses, and become completely inadequate 
for testing a continuum of hypotheses. 

A convenient and computationally attractive approach for performing inference on func- 
tion-valued parameters is to use Kolmogorov-Smirnov type procedures. Some complica- 
tions arise in our case because the limit processes are non-pivotal, as their covariance 
functions depend on unknown, though estimable, nuisance parameters^ A practical and 
valid way to deal with non-pivotality is to use resampling and related simulation meth- 
ods. An attractive feature of our theoretical analysis is that validity of resampling and 
simulation methods follows from the Hadamard differentiability of the policy functionals 
with respect to the underlying conditional functions. Indeed, given that bootstrap and 
other methods can consistently estimate the limit laws of the estimators of the conditional 
distribution and quantile functions, they also consistently estimate the limit laws of our 
policy estimators. This convenient result follows from preservation of validity of bootstrap 
and other resampling methods for estimating laws of Hadamard different iable functionals; 
see more on this in Lemma E] in Appendix A. 

Theorem 3 (Validity of bootstrap and other simulation methods for estimating the laws of 
policy estimators of function- valued parameters) . // the bootstrap or any other simulation 
method consistently estimates the laws of the limit stochastic processes liS. 1\) and li3.S\) for 
the estimators of the conditional quantile or distribution function, then this method also 
consistently estimates the laws of the limit stochastic processes h3. 3\) . h3.5^) . liS. 6\) . ^3. 7|j, 



and Ii3. 8\) for policy estimators of marginal distribution and quantile functions and other 
functionals. 



Similar non-pivotality issues arise in a variety of goodness-of-fit problems studied by Durbin and others, 
and are referred to as the Durbin problem by Koenker and Xiao (2002). 
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Theorem 3 shows that the bootstrap is vahd for estimating the hmit laws of various 
inferential processes. This is true provided that the bootstrap is valid for estimating the 
limit laws of the (function-valued) estimators of the conditional distribution and quantile 
functions. This is a reasonable condition, but, to the best of our knowledge, there are no 
results in the literature that verify this condition for our principal estimators. Indeed, the 
previous results on the bootstrap established its validity only for estimating the pointwise 
laws of our principal estimators, which is not sufficient for our purposesjj To overcome this 
difficulty, in Appendix F we prove validity of the empirical bootstrap and other related 
methods, such as Bayesian bootstrap, wild bootstrap, k out of n bootstrap, and subsam- 
pling bootstrap, for estimating the laws of function-valued estimators, such as quantile 
regression and distribution regression processes. These results may be of substantial inde- 
pendent interest. 

We can then use Theorem 3 to construct the usual uniform bands and perform inference 
on the marginal distribution and quantile functions, and various functionals, as described 
in detail in Chernozhukov and Fernandez- Val (2005) and Angrist, Chernozhukov, and 
Fernandez- Val (2006). Moreover, if the sample size is large, we can reduce the computa- 
tional complexity of the inference procedure by resampling the first order approximation 
to the estimators of the conditional distribution and quantile functions (Chernozhukov 
and Hansen, 2006); by using subsampling bootstrap (Chernozhukov and Fernandez- Val, 
2005); or by simulating the limit processes Zj or Vj, j G {0, 1}, appearing in expressions 
(13. ip and (13. 2p . using multiplier methods (Barrett and Donald, 2003). 

3.6. Incorporating uncertainty about the distribution of the covariates. In the 

preceding analysis we assumed that we know the distributions of the covariates before and 
after the policy intervention for the target population. In practice, however, we usually 
observe such distributions only for individuals in the sample. If the individuals in the 
sample are the target population, then the previous limit theory is valid for performing 
inference without any adjustments. If a more general population group is the target 
population, then the distributions of the covariates need to be estimated, and the previous 
limit theory needs to be adjusted to take this into account. Here we highlight the main 
ideas, while in Appendix D we present formal distribution and inference theory. 

We begin by assuming that the estimators x ^— Fx^.{x), k G {0,1}, of the covariate 
distribution functions are well behaved, specifically that they converge jointly in law to 



Exceptions include Chernozhukov and Hansen (2006) and Chernozhukov and Fernandez- Val (2005), 
but they looked at forms of subsampling only. 
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Gaussian processes Bx,,, k G {0, 1}: 

- Fx,{x)) ^ x/A;i?x,(x), /c G {0, 1}, 

as rigorously defined in Appendix ID.li This assumption is quite general and holds for 
conventional estimators such as the empirical distribution under i.i.d. sampling as well as 
various modifications of conventional estimators, as discussed further in Appendix D. The 
joint convergence holds trivially in the leading cases where the distribution in group 1 is 
a known transformation of the distribution in group 0, or when the two distributions are 
estimated from independent samples. 

The estimation of the covariate distributions affects limit distributions of functionals of 
interests. Let us consider, for example, the marginal distribution functions. When the 
covariate distributions are unknown, the plug-in estimators for these functions take the 
form Fy^di) = FYj{y\x)dFx^.{x), j,k G {0,1}. The limit processes for these estimators 
become 




where the familiar first component arises from the estimation of the conditional distribu- 
tion and the second comes from the estimation of the distributions of the covariates. In 
Appendix D we discuss further details. 

4. Labor Market Institutions and the Distribution of Wages 

The empirical application in this section draws its motivation from the influential article 
by DiNardo, Fortin, and Lemieux (1996, DFL hereafter), which studied the effects of insti- 
tutional and labor market factors on the evolution of the U.S. wage distribution between 
1979 and 1988. The goal of our empirical application is to complete and complement 
DFL's analysis by using a wider range of techniques, including quantile regression and 
distribution regression, and to provide confidence intervals for scalar-valued effects as well 
as function-valued effects of the institutional and labor market factors, such as quantile, 
distribution, and Lorenz policy effects. 

We use the same dataset as in DFL, extracted from the outgoing rotation groups of the 
Current Population Surveys (CPS) in 1979 and 1988. The outcome variable of interest 
is the hourly log-wage in 1979 dollars. The regressors include a union status dummy, 
nine education dummies interacted with experience, a quartic term in experience, two 
occupation dummies, twenty industry dummies, and dummies for race, SMSA, marital 
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status, and part-time status. Following DFL we weigh the observations by the product 
of the CPS sampling weights and the hours worked. We analyze the data for men and 
women separately. 

The major factors suspected to have an important role in the evolution of the wage 
distribution between 1979 and 1988 are the minimum wage, whose real value declined by 
27 percent, the level of unionization, whose level also dechned from 30 percent to 21 percent 
in our sample, and the composition of the labor force, whose education levels and other 
characteristics have also changed substantially during this period. Thus, following DFL, 
we decompose the total change in the US wage distribution into the sum of four effects: 
(1) the effect of a change in minimum wage, (2) the effect of de- unionization, (3) the effect 
of changes in the composition of the labor force, and (4) the price effect. The effect (1) 
measures changes in the marginal distribution of wages that occur due to a change in the 
minimum wage; the effects (2) and (3) measure changes in the marginal distribution of 
wages that occur due to a change in the distribution of a particular factor, having fixed 
the distribution of other factors at some constant level; the effect (4) measures changes in 
the marginal distribution of wages that occur due to a change in the wage structure, or 
conditional distribution of wages given worker characteristics. 

Next we formally define these four effects as differences between appropriately chosen 
counterf actual distribution functions. Let Fy^^" denote the counterf actual marginal dis- 
tribution function of log-wages Y when the wage structure is as in year t, the minimum 
wage, m, is as the level observed for year s, the distribution of union status, U, is as the 
distribution observed in year r, and the distribution of other worker characteristics, Z, is 
as the distribution observed in year v. We identify and estimate such counterf actual dis- 
tributions using the procedures described below. Given these counterfactual distributions, 
we can decompose the observed total change in the distribution of wages between 1979 
and 1988 into the sum of four effects: 

YaStmss ^79,^79 I. Ys8,rrLHH >88, "179-1 ^ i88,m79 Yssjmjgl 

(1) (2) /4_;l) 

_l_ rpUrQ,Z88 rpUrQ,Zrg-\ , TrpUrg,Zr9 771(^79, •^791 ^ ' 

' L-'^y88,"l79 ^88 ,"179-1 '■ '^88,^79 "'^ "^79 ,"179 J ' 

(3) (4) 

The first component is the effect of the change in the minimum wage, the second is the 
effect of de-unionization, the third is the effect of changes in worker characteristics, and 
the fourth is the price effect. As stated above, we see that the effects (2) and (3) measure 
changes in the marginal distribution of wages that occur due to a change in the distribution 
of a particular factor, having fixed the distribution of other factors at some constant level. 
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The effect (4) captures changes in the wage structure or conditional distribution of wages 
given observed characteristics; in particular, it captures the effect of changes in the market 
returns to workers' characteristics, including education and experience. Finally, we discuss 
the interpretation of the minimum wage effect (1) in detail below. 

The decomposition (14.11) is the distribution version of the Oaxaca-Blinder decomposition 
for the mean. We obtain similar decompositions for other functionals (pi^y^^'l^^) of interest, 
such as marginal quantiles and Lorenz curves, by making an appropriate substitution in 
equation (14. ip : 

(1) (2) 
(3) (4) 

(4.2) 

In constructing the decompositions « and », we follow the same sequential order as 
in DFLo Also, like DEL, we follow a partial equilibrium approach, but, unlike DFL, we 
do not incorporate supply and demand factors in our analysis because they do not fit well 
in our framework. 

We next describe how to identify and estimate the various counterfactual distributions 
appearing in (14.11) . The first counterfactual distribution we need is Fy^^'^^^, the distri- 
bution of wages that we would observe in 1988 if the real minimum wage were as high 
as in 1979. Identifying this quantity requires additional assumptions,^ Following DFL, 
the first strategy we employ is to assume the conditional wage density at or below the 
minimum wage depends only on the value of the minimum wage, and the minimum wage 
has no employment effects and no spillover effects on wages above its level. The second 
strategy we employ completely avoids modeling the conditional wage distribution below 
the minimal wage by simply censoring the observed wages below the minimum wage to 
the value of the minimum wage. Under the first strategy, DFL show that 

Fvi.,„„(!,|., .) = '^"""*" " ' <•»'*•"• ' ^ (4.3) 

where FYt,ms{y\u, z) denotes the conditional distribution of wages at year t given worker 
characteristics when the level of the minimum wage is as in year s. Under the second 



^ The choice of sequential order matters and can affect the relative importance of the four effects. We 
report some results for the reverse sequential order in the Appendix. 

"'^'^We cannot identify this quantity from random variation in minimum wage, since the federal minimum 
wage does not vary across individuals and varies little across states in the years considered. 
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strategy, we have that 

F ^ I ^ / ^^y< "^79; 

Given either (14. 3p or (14.41) we identify the counterfactual distribution of wages using the 
representation: 

FS:£:iy) = J Fns,mrMu, z)dFuzssiu, z), (4.5) 

where Fuzt is the joint distribution of worker characteristics and union status in year t. 
We can then estimate this distribution using the plug-in principle. In particular, we esti- 
mate the conditional distribution in expressions (14. 3 p and (14.40 using one of the regression 
methods described below, and the distribution function -Fl/^gg using its empirical analog. 

The other counterfactual marginal distributions we need are 

FY::£:(y) = J jFy,,,^,,iy\u,z)dFu,M^)dFzM (4.6) 

and 

FYslilliy) = I Fy,,,m,, {y\u, z) dFuz,, («, z) . (4.7) 

Given either of our assumptions on the minimum wage all the components of these distribu- 
tions are identified and we can estimate them using the plug-in principle. In particular, we 
estimate the conditional distribution Fygg^mygiylu, z) using one of the regression methods 
described below, the conditional distribution Fij^g{u\z) , u G {0, 1}, using logistic regression, 
and Fzgg (z) and Fuzrs using the empirical distributions. 

Formulas (14. 5 1) - (14. 7p giving the expressions for the counterfactual distributions reflect 
the assumptions that give the counterfactual distributions a formal causal interpretation. 
Indeed, we assume in (14.60 and (14. 7p that we can fix the relevant conditional distributions 
and change only the marginal distributions of the relevant covariates. In (14. 5p . we also 
specify how the conditional distribution of wages changes with the level of the minimum 
wage. Note that we directly observe the marginal distributions appearing on the left side 
of the decomposition (14.10 and estimate them using the plug-in principle. 

To estimate the conditional distributions of wages we consider three different regression 
methods: classical regression, linear quantile regression, and distribution regression with 
a logit link. The classical regression, despite its wide use in the literature, is not appro- 
priate in this application due to substantial conditional heteroscedasticity in log wages 
(Lemieux, 2006, and Angrist, Chernozhukov, and Fernandez- Val, 2006). The linear quan- 
tile regression is more flexible, but it also has shortcomings in this application. First, 
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there is a considerable amount of rounding, especially at the level of the minimum wage, 
which makes the wage variable highly discrete. Second, a linear model for the conditional 
quantile function may not provide a good approximation to the conditional quantiles near 
the minimum wage, where the conditional quantile function may be highly nonlinear. The 
distribution regression approach does not suffer from these problems, and we therefore 
employ it to generate the main empirical results. In order to check the robustness of 
our empirical results, we also employ the censoring approach described above. We set 
the wages below the minimum wage to the value of the minimum wage and then apply 
censored quantile and distribution regressions to the resulting data. In what follows, we 
first present the empirical results obtained using distribution regression, and then briefly 
compare them with the results obtained using censored quantile regression and censored 
distribution regression. 

We present our empirical results in Tables 1-3 and Figures 1-9. In Figure 1, we compare 
the empirical distributions of wages in 1979 and 1988. In Table 1, we report the estimation 
and inference results for the decomposition fl4.2p of the changes in various measures of wage 
dispersion between 1979 and 1988 estimated using distribution regressions|ll| Figures 2- 
7 refine these results by presenting estimates and 95% simultaneous confidence intervals 
for several major functionals of interest, including the effects on entire quantile functions, 
distribution functions, and Lorenz curves. We construct the simultaneous confidence bands 
using 100 bootstrap replications and a grid of quantile indices {0.02, 0.021, 0.98}. We 
plot all of these function-valued effects against the quantile indices of wages. In Tables 2-3 
and Figures 8-9, we present the estimates of the same effects as in Table 1 and Figures 
2-3 estimated using various alternative methods, such as censored quantile regression and 
censored distribution regression. Overall, we find that our estimates, confidence intervals, 
and robustness checks all reinforce the findings of DFL, giving them a rigorous econometric 
foundation. Indeed, we provide standard errors and confidence intervals, without which 
we would not be able to assess the statistical significance of the results. Moreover, we 
validate the results with a wide array of estimation methods. In what follows below, we 
discuss each of our results in more detail. 

In Figure 1, we present estimates and uniform confidence intervals for the marginal 
distributions of wages in 1979 and 1988. We see that the low end of the distribution is 
significantly lower in 1988 while the upper end is significantly higher in 1988. This pattern 



The estimation results parallel the results presented in DFL. Table Al in the Appendix gives the 
results for the decomposition in reverse order. 
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reflects the well-known increase in wage inequality during this period. Next we turn to the 
decomposition of the total change into the sum of the four effects. For this decomposition 
we focus mostly on quantile functions for comparability with recent studies and to facilitate 
the interpretation. In Figures 2-3, we present estimates and uniform confidence intervals 
for the total change in the marginal quantile function of wages and the four effects that 
form a decomposition of this total change 1^ We report the marginal quantile functions in 
1979 and 1988 in the top left panels of Figures 2 and 3. In Figures 4-7, we plot analogous 
results for the decomposition of the total change in marginal distribution functions and 
Lorenz curves. 

From Figures 2 and 3, we see that the contribution of union status to the total change is 
quantitatively small and has a U-shaped effect across the quantile function for men. The 
magnitude and shape of this effect on the marginal quantiles between the first and last 
decile sharply contrast with the quantitatively large and monotonically decreasing shape of 
the effect of the union status on the conditional quantile function for this range of indexes 
(Chamberlain, 1994), and illustrates the difference between conditional and unconditional 
effects!^ In general, interpreting the unconditional effect of changes in the distribution of 
a covariate requires some care, because the covariate may change only over certain parts 
of its support. For example, de- unionization cannot affect those who were not unionized 
at the beginning of the period, which is 70 percent of the workers; and in our data, the 
unionization declines from 30 to 21 percent, thus affecting only 9 percent of the workers. 
Thus, even though the conditional impact of switching from union to non-union status can 
be quantitatively large, it has a quantitatively small effect on the marginal distribution 
since only 9 percent of the workers are affected. 

From Figures 2 and 3, we also see that the change in the distribution of worker char- 
acteristics (other than union status) is responsible for a large part of the increase in wage 
inequality in the upper tail of the distribution. The importance of these composition effects 
has been recently stressed by Lemieux (2006) and Autor, Katz and Kearney (2008). The 
composition effect is realized through at least two channels. The first channel operates 
through between-group inequality. In our case, higher educated and more experienced 



""^^Discreteness of wage data implies that the quantile functions have jumps. To avoid this erratic 
behavior in the graphical representations of the results, we display smoothed quantile functions. The non- 
smoothed results are available from the authors. The quantile functions were smoothed using a bandwidth 
of 0.015 and a Gaussian kernel. The results in Tables 1-3 and Al have not been smoothed. 

^''We find similar estimates to Chamberlain (1994) for the effect of union on the conditional quantile 
function in our CPS data. 
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workers earn higher wages. By increasing their proportion, we induce a larger gap be- 
tween the lower and upper tails of the marginal wage distribution. The second channel 
is that within-group inequality varies by group, so increasing the proportion of high vari- 
ance groups increases the dispersion in the marginal distribution of wages. In our case, 
higher educated and more experienced workers exhibit higher within-group inequality. By 
increasing their proportion, we induce a higher inequality within the upper tail of the 
distribution. To understand the effect of these channels in wage dispersion it is useful to 
consider a linear quantile model Y = X'f3{U), where X is independent of U. By the law 
of total variance, we can decompose the variance of Y into: 

Var[Y] = E[f3{U)]'Var[X]E[f3{U)] + trace{E[XX']Var[f3{U)]}. (4.8) 

The first channel corresponds to changes in the first term of (14. 8 p where V^ar[X] represents 
the heterogeneity of the labor force (between group inequality); whereas the second channel 
corresponds to changes in the second term of (14. 8 p operating through the interaction of 
between group inequality E[XX'] and within group inequality Var[P{U)]. 

In Figures 2 and 3, we also include estimates of the price effect. This effect captures 
changes in the conditional wage structure. It represents the difference we would observe 
if the distribution of worker characteristics and union status, and the minimum wage 
remained unchanged during this period. This effect has a U-shaped pattern, which is 
similar to the pattern Autor, Katz and Kearney (2006a) find for the period between 1990 
and 2000. They relate this pattern to a bi-polarization of employment into low and high 
skill jobs. However, they do not find a U-shaped pattern for the period between 1980 and 
1990. A possible explanation for the apparent absence of this pattern in their analysis 
might be that the declining minimum wage masks this phenomenon. In our analysis, once 
we control for this temporary factor, we do uncover the U-shaped pattern for the price 
component in the 80s. 

In Tables 2-3 and Figures 8-9, we present several interesting robustness checks. As we 
mentioned above, the assumptions about the minimum wage are particularly delicate, since 
the mechanism that generates wages strictly below this level is not clear; it could be mea- 
surement error, non-coverage, or non-compliance with the law. To check the robustness of 
the results to the DFL assumptions about the minimum wage and to our semi-parametric 
model of the conditional distribution, we re-estimate the decomposition using censored 
linear quantile regression and censored distribution regression with a logit link, using the 
wage data censored below the minimum wage. For censored quantile regression, we use 
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Powell's (1986) censored quantile regression estimated using Cliernozliukov and Hong's 
(2002) algorithm. For censored distribution regression, we simply censor to zero the distri- 
bution regression estimates of the conditional distributions below the minimum wage and 
recompute the functionals of interest. Overall, we find the results are very similar for the 
quantile and distribution regressions, and they are not very sensitive to the censoring^ 

5. Conclusion 

This paper develops methods for performing inference about the effect on an outcome of 
interest of a change in either the distribution of policy-related variables or the relationship 
of the outcome with these variables. The validity of the proposed inference procedures 
in large samples relies only on the applicability of a functional central limit theorem for 
the estimator of the conditional distribution or conditional quantile function. This condi- 
tion holds for most important semiparametric estimators of conditional distribution and 
quantile functions, such as classical, quantile, duration, and distribution regressions. 
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Appendix 

This Appendix contains proofs and additional results. Section A collects preliminary 
lemmas on the functional delta method and derives the functional delta method for any 
simulation method, extending its applicability beyond the bootstrap. Section B collects 
the proofs for the results in the main text of the paper. Section C gives limit distribution 
theory for policy effects estimators. Section D presents additional results for the case 
where the covariate distributions are estimated. These results complement the results in 
the main text. Section E derives limit theory, including Hadamard differentiability, for 
Z-processes and Section F applies this theory to the principal estimators of conditional 
distribution and quantile functions. These results establish the validity of bootstrap and 

"'^^We have additional results on quantile, distribution and Lorenz effects for the censored estimates; 
these are available on request from the authors. We do not report them here to save space. 
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other resampling schemes for the entire quantile regression process, the entire distribution 
regression process, and related processes arising in the estimation of various conditional 
quantile and distribution functions. These results may be of a substantial independent 
interest. 

Appendix A. Functional Delta Method, Bootstrap, and Other Methods 

This section collects preliminary lemmas on the functional delta method and derives the 
functional delta method for any simulation method, extending its applicability beyond the 
bootstrap. 

A.l. Some definitions and auxiliary results. We begin by quickly recalling from van 
der Vaart and Wellner (1996) the details of the functional delta method. 

Definition 1 (Hadamard-differentiability). Let Dq, D, and E be normed spaces, with 
Do C D. A map : D<^ C D i— > E is called Hadamard-dijferentiable at 9 E'D^p tangentially 
to Do if there is a continuous linear map 0^ : Do i— > E such that 

<P{0 + tnhn) - (t>{9) 

7 ^(t>e\h), n^oo, 

for all sequences t„ — > and /in — > /i e Dq such that 9 + £ D^ for every n. 

This notion works well together with the continuous mapping theorem. 

Lemma 2 (Extended continuous mapping theorem). Let D„ G 3 be arbitrary subsets 
and 5f„ : D„ 1-^ E &e arbitrary maps (n > 0), such that for every sequence G D„ : 
if Xn' — X G Do along a subsequence, then gn'{xn') go{x)- Then, for arbitrary maps 
Xn : fin I— D„ and every random element X with values in Dq such that go{X) is a random 
element in E.- 

(i) IfXn X, then gu{Xu) =^ goiX); 

(ii) X, then gn{Xn) go{X). 

The combination of the previous definition and lemma is known as the functional delta 
method. 

Lemma 3 (Functional delta-method). Let Dq, D, and E be normed spaces. Let : D<^ C 
D I— > E &e Hadamard-differentiable at 9 tangentially to Do- Let Xn : i— > D<^ be maps with 
rn{Xn — 9) ^ X inJ]), where X is separable and takes its values in Dq, for some sequence 
of constants rn — > oo. Then r„ (0(X„) — (t){9)) ^ 0^(X). is defined and continuous 
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on the whole of ID), then the sequence r„ (0(X„) — 0(^)) — 0g (r„(X„ — 6)) converges to zero 
in outer probability. 

The applicability of the method is greatly enhanced by the fact that Hadamard differ- 
entiation obeys the chain rule. 

Lemma 4 (Chain rule). //</): C D ^—> is Hadamard- differentiable at 6 ^ 3^ 
tangentially to Dq '^'^^^ ■i/' : F zs Hadamard- differentiable at (f){6) tangentially to 

0'(©o)j then ip o (j) : 3^ ¥ is Hadamard-differentiable at 9 tangentially to ©o with 
derivative "^0(9) o (j^'e- 

Another technical result to be used in the sequel is concerns the equivalence of continuous 
and uniform convergence. 

Lemma 5 (Uniform convergence via continuous convergence). Let D and E be complete 
separable metric spaces, with © compact. Suppose / : D i-^ E zs continuous. Then a 
sequence of functions /„ : D i— E converges to f uniformly on © if and only if for any 
convergent sequence Xn —>■ x in 3 we have that fn{xn) —>■ f{x)- 

Proof of Lemmas [2HH See van der Vaart and Wellner (1996) Chap. 1.11 and 3.9. □ 
Proof of Lemma [S See, for example, Resnick (1987), page 2. □ 

A. 2. Functional delta-method for bootstrap and other simulation methods. Let 

= (Wi, Wn) denote the data. Consider sequences of random elements Vn = Vn{J-'n), 
the original empirical process. In a normed space D, the sequence \/n{Vn — V) converges 
unconditionally to the process G. Let the sequence of random elements 

K = Vn + GJV^ (A.l) 

where m = m{n) is a possibly random sequence such that m/mo — s>p 1 for some sequence 
of constants mo — * oo such that mo/n c > O^j and the "draw" G„ is produced by 
bootstrap, simulation, or any other consistent method that guarantees that the sequence 
Gn converges conditionally given JF„ in distribution to a tight random element G, 

SUp,eBLi(B)|^|^„/^(Gn)*-i?/^(G)| ^0, (A.2) 

in outer probability, where BLi(D) denotes the space of function with Lipschitz norm at 
most 1 and E\jr^ denotes the conditional expectation given the data. In the definition, we 
can take G to be independent of 



"'^^The random scaling is needed to cover wild bootstrap, for example. 
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Given a map : D,^ C D i-^ E, we wish to show that 



E|^>(v/^(0(K)-0(K)))*-i5/i(0'y(G))| ^0, (A.3) 



SUP/jgBLi{E) 

in outer probabihty. 

Lemma 6 (Deha- method for bootstrap and other simulation methods). Let Dq? o,nd 
E be normed spaces, with Do C ©. Let (p : C © t— > E 6e Hadamard-differentiahle at V 
tangentially to ©o- Let Vn and Vn be maps as indicated previously with values in ID)^ such 
that \fniyn — y) ^ G and \A.'2^ holds in outer probability, where G is separable and takes 



its values in Dq- Then M.3j) holds in outer probability. 

Proof of Lemma [6t The proof generalizes the functional delta-method for empirical 
bootstrap in Theorem 3.9.11 of van der Vaart and Wellner (1996) to exchangeable boot- 
strap. This expands the applicability of delta-method to a wide variety of resampling and 
simulation schemes that are special cases of exchangeable bootstrap, including empirical 
bootstrap, Bayesian bootstrap, wild bootstrap, k out of n bootstrap, and subsampling 
bootstrap (see next section for details). 

Without loss of generality, assume that the derivative 0'^^ : D t— > E is defined and contin- 
uous on the whole space. Otherwise, replace E by its second dual E** and the derivative 
by an extension 0y : D i-^ E**. For every h G BLi(E), the function h o (py is contained in 
BL||^^ll(D). Thus (D implies sup^eBU(E) \E\:p^h {^Gn))* - Eh{4>'y{G))\ 0, in outer 
probability. Next 

(A.4) 



E\^^h (^s/m (0(K) - 0(K)))* - E\^^h {<p'y (G„))), 
m (0(K) - 0(K)) - <P'v [V^iVn - K)) II* > 



S^P/ieBLi(E) 

<^ + 2P|^„( 

The theorem is proved once it has been shown that the conditional probability on the right 
converges to zero in outer probability. 

Both sequences \/rn{Vn — V) and = V^{Vn — V) converge (unconditionally) in 
distribution to separable random elements that concentrate on the space Dq. The first 
sequence converges by assumption and Slutsky's theorem when m/rrio — >p 1 and mo/n — 
c > and converges to zero when mo/n by assumption and Slutsky's theorem. The 
second sequence converges, by noting that 



m(K -V) = v^(K - K) + V^{Vn - V) 
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and that E|E|^,^/i(v/m(K - K)* + ^n) - + 1„) | < sup;,gBLi(D„) ^l^l^n^(v^(K - 

Vn)T - ^|^XG)| = sup;,gBL^(p^^)E|E|^„/i((G„)* - Eijr^h{G)\ which converges to zero by 
(lA.2p . and by E\yr^^h{G) = Eh{G) due to independence of G from 

By Lemma [31 

yM{(j){Vn) - <p{v)) = <j)'y iV^iVn - V)) + o*p{i). 



(A.5) 



Subtract these equations to conclude that the sequence i/m(0(Ki) — — 0y(v^(^~ 
Vn)) converges unconditionally to zero in outer probability. Thus, the conditional proba- 
bility on the right in flA.4p converges to zero in outer mean. □ 

A. 3. Exchangeable Bootstrap. Let {Wi, ...,Wn) denote the i.i.d. data. Next we define 
the collection of exchangeable bootstrap methods that we can employ for inference. For 
each n, let (e„i, e„„) be an exchangeable, nonnegative random vector. Exchangeable 
bootstrap uses the components of this vector as random sampling weights in place of 
constant weights (1, 1). A simple way to think of exchangeable bootstrap is as sampling 
each variable Wi the number of times equal to e„j, albeit without requiring e„j to be integer- 
valued. Given an empirical process V^(/) = ^J27=ifi-^i)y define an exchangeable 
bootstrap draw of this process as 

1 " 

where e„ = Y17=i ^m/'n. This insures that each draw of Vn assigns nonnegative weights to 
each observation, which is important in applications of bootstrap to extremum estimators 
to preserve convexity of criterion functions. We assume that, for some e > 

n 

sup E[el\'] < cx), ^ [e^i - Cnf 1, -^p c > 0, (A.6) 

n . , 

1=1 

where the first two conditions are standard, see Van der Vaart and Wellner (1996), and 
the last one is needed to apply the previous lemma. Let us consider the following special 
cases: (1) The standard empirical bootstrap corresponds to the case where (e„i, e„„) 
is a multinomial vector with parameters n and probabilities (1/n, 1/n), so that 6^ = 1 
and m = n. (2) The Bayesian bootstrap corresponds to the case where f/i, [/„ are i.i.d. 
nonnegative random variables, e.g. unit exponential, with E\Ul'^'^] < oo for some e > 0, 
and Cni = Ui/Un, so that = 1 and m = n. (3) The wild bootstrap corresponds to the 
case where e„i, e„„ are i.i.d. vectors with £'[e^|'^] < oo for some e > 0, and V^ar[e„i] = 1, 
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SO that m/n = Ee^i > and mo = nEef^i — > oo. (4) The k out of n bootstrap 

resamples k < n observations from Wi, ...,Wn with replacement. This corresponds to 
letting (e„i, ...,e„ri) be equal to ^/n/k times multinomial vectors with parameters k and 
probabilities (1/n, l/ra). The condition (1A.6I1 on the weights holds if oo, so that 
= fc/n — > c > and m = k ^ oo. (5) The subsampling bootstrap corresponds to 
resampling k < n observations from Wi, Wn without replacement. This corresponds to 
letting (e„i, ...,e„„) be a row of k times the number n{n — k)~^/'^k~^^'^ and n — k times 
the number 0, ordered at random, independent of the VFj's. The condition (1A.6P on the 
weights holds if both k ^ oo and n — k ^ oo. \n this case e\ = k/{n — k) — * c > and 
m = nk/{n — k) oo. 

As a consequence of Lemma El we obtain the following result, which might be of inde- 
pendent interest. 

Lemma 7 (Functional delta method for exchangeable bootstrap). The exchangeable hoot- 
strap method described above satisfies condition and therefore the conclusions of 
Lemma about validity of the functional delta method apply to this method. 

Proof of Lemma O By Lemma O we only need to verify condition flA.2p . which follows 
by Theorem 3.6.13 of Van der Vaart and Wellner (1996). □ 

Appendix B. Inference Theory for Counterfactual Estimators (Proofs) 
This section collects the proofs for the results in the main text of the paper. 

B.l. Notation. Define := Qy(f/|x), where U ~ Uniform(W) with U = (0, 1). Denote 
by yx the support of Y^, yX := {{y,x) : y G yx,x G X}, and UX ■.= U x X. We assume 
throughout that y^ C 3^, which is a compact subset of M, and that x E X, a. compact subset 
of MP. In what follows, i°°{UX) denotes the set of bounded and measurable functions 
h : UX M, and C{UX) denotes the set of continuous functions mapping h : UX M.. 

B.2. Uniform Hadamard differentiability of conditional distribution functions 
with respect to the conditional quantile functions. The following lemma establishes 
the Hadamard differentiability of the conditional distribution function with respect to the 
conditional quantile function. We use this result to prove Lemma [1] in the main text and 
to derive the limit distribution for the policy estimators based on conditional quantile 
models. We drop the dependence on the group index to simplify the notation. 
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Lemma 8 (Hadamard derivative of -Fy(?/|x) with respect to Qy{u\x)). Define FY{y\x, ht) 
:= l{QYiu\x) + tht{u\x) < y}du. Under condition C, as t\ 0, 

Dht{y\x,t) = ^^^y^^' — ^Y{y\x) _^ Dh{y\x) := -fY{y\x)h{FY{y\x)\x). 

The convergence holds uniformly in any compact subset ofyX := {{y,x) \ y E 3^^, x G X}, 
for every \\ht — /i||oo — ^ 0, where ht G iUX), and h G C{UX). 

Proof of Lemma [S We have that for any 5 > 0, there exists e > such that for 
u G B^{FY{y\x)) and for small enough t > 

1{Qy{u\x) + tht{u\x) <y}< 1{Qy{u\x) + - 5) < y}- 

whereas for all u ^ B^{Fy{ii\x)), 

1{Qy{u\x) + tht{u\x) <y} = 1{Qy{u\x) < y}. 

Therefore, for small enough t > 

Jq 1{Qy{u\x) + tht{u\x) < y]du - 1{Qy{u\x) < y}du 



t 

< . 1{Qy{u\x) + t{h{FY{y\x)\x) -6)<y}- 1{Qy{u\x) < y} 

' B,{FY(.y\x)) t 



(B.l) 



fYiy\x)dy, 

Jnly,y-t{h(FY{y\x)\x)-S)] 



which by the change of variable y = Qy{u\x) is equal to 

1 
1 

where J is the image of B^{FY{y\x)) under u Qy{-\x). The change of variable is possible 
because Qy{-\x) is one-to-one between i?e(Fy(?/|x)) and J. 

Fixing e > 0, for t \ 0, we have that J fl [y,y — t{h{FY{y\x)\x) — 6)] = [y,y — 
t{h{FY{y\x)\x) — 6)], and /y(y|a;) /y(y|x) as — > Fy(?/|x). Therefore, the right 
hand term in fIB.ip is no greater than 

-fYiy\x) ih{FYiy\x)\x)-5) + oil). 

Similarly — /y(y|a;) (/i(Fy(?/|x)|x) + 6) + o(l) bounds (IB. II) from below. Since 6 > can 
be made arbitrarily small, the result follows. 

To show that the result holds uniformly in {y,x) E K, a. compact subset of yX, we use 
LemmaO Take a sequence of {yt, Xt) in K that converges to {y, x) G K, then the preceding 
argument applies to this sequence, since the function {y,x) t-^ —fY{y\x)h{FY{y\x)\x) is 
uniformly continuous on K. This result follows by the assumed continuity of h{u\x), 
and in both arguments, and the compactness of K. □ 
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B.3. Proof of Lemma 1. This result follows by the Hadamard differentiability of the con- 
ditional distribution function with respect to the conditional quantile function in Lemma 
IHl Condition Q, and the functional delta method in Lemma [31 □ 

B.4. Proof of Theorem 1. The joint uniform convergence result follows from Condition 
D by the extended continuous mapping theorem in Lemma [21 since the integral is a contin- 
uous operator. Gaussianity of the limit process follows from linearity of the integral. □ 

B.5. Proof of Theorem 2. The joint uniform convergence result and Gaussianity of the 
limit process follow from Theorem 1 by the functional delta method in Lemma [31 , since 
the quantile operator is Hadamard differentiable (see, e.g., Doss and Gill, 1992). □ 

B.6. Proof of Corollary 1. This result follows from Theorem 2 by the extended contin- 
uous mapping theorem in Lemma [H □ 

B.7. Proof of Corollary 2. This result follows from Theorem 1 by the extended contin- 
uous mapping theorem in Lemma [21 □ 

B.8. Proof of Corollary 3. This result follows from Theorem 1 by the functional delta 
method in Lemma [31 and the chain rule for Hadamard differentiable functionals in Lemma 

m □ 

B.9. Proof of Theorem 3. This result follows from the functional delta method for the 
bootstrap and other simulation methods in Lemma [61 □ 



Appendix C. Limit distribution for the estimators of the effects 

For policy interventions that can be implemented either as a known transformation 
of the covariate, Xi = g{XQ), or as a change in the conditional distribution of Y given 
X, we can also identify and estimate the distribution of the effect of the policy, Aj = 
— Yq , j,k G {0,1}, under Condition RP stated in the main text. The following 
results provide estimators for the distribution and quantile functions of the effects and 
limit distribution theory for them. Let V = {6&M.:6 = y — y,y&y,y & y}. 

Lemma 9 (Limit distribution for estimators of conditional distribution and quantile func- 
tions). Let QAoiu\x) = QYoiu\g{x)) — QYoiu\x) and QAiiu\x) = Qy^iulx) — QYoiu\x) be 
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estimators of the conditional quantile function of the effect Qaj ('w|x), j G {0, 1}0 Under 
the conditions C, Q, and RP, we have: 

V^(Q^j{u\x) - Q^^{u\x)^ ^Vj^^{u,x), j e {0,1}, 

in £°°((0, 1) X X), where Vj\^{u,x) := \/Xq[Vo{u, g{x)) — Vo{u,x)] and Vj\-^{u,x) : = 
y/XiVi{u,x) — \/XqVo{u,x). The Gaussian processes {u,x) t— > Vaj{u,x), j G {0,1}, have 
zero mean and covariance function Qy.^,{u,x,u,x) := E[Va.{u,x)Va^{u,x)], for j,r G 
{0,1}. 

Let F/^^(5\x) = 1{<5aj (wlx) < 5}du be an estimator of the conditional distribution of 
the effects Faj(5|x), forj G {0, 1}. Under the conditions C, Q, and RP, we have: 

\/^(^A,(5|x) -Fa,((5|x)) -/a,(5|x)Va,(Fa,(5|x),x) =: Za,(5,x), j G {0,1}, 

in X X), and {S,x) ^ Z/\^{5,x), j G {0, 1}, have zero mean and covariance function 

X, 5, x) := E[Z/^.{5,x)Zi^^{5,x)], for j,r G {0,1}. The conditional density of the 
effect, fAj{S\x), is assumed to be bounded above and away from zero^ 

Proof of Lemma (91 The uniform convergence result for the conditional quantile processes 
y/n{QAj{u\x) — QAjiu\x)),i G {0, 1}, follows from Conditions Q and RP by the extended 
continuous mapping theorem in Lemma [H Uniform convergence of the conditional distri- 
bution processes \/n{FA.{5\x) — Fa^{5\x)), j G {0, 1}, follows from the covergence of the 
quantile process by the functional delta method in Lemma [31 The Hadamard differentia- 
bility of FAj{5\x) with respect to QAj{u\x) can be established using the same argument 
as in the proof of Lemma [HI □ 

Theorem 4 (Limit distribution for estimators of the marginal distribution and quantile 
functions). Under the conditions M, C, Q, and RP, the estimators F^,{6) = 
Jx FAj{S\x)dFxf.{x) of the marginal distributions of the effects F^^{6) jointly converge 
in law to the following Gaussian processes: 

V^(^A,W -^tW) ^ J^ZA^i5,x)dFx,ix) =: Zi^i6), j,ke {0,1}, 

in C.°°{T>), where 5 ^ Z\_{5)^ j,k G {0,1}, have zero mean and covariance function 
:= E[Zi^{5)Z^^J)], forj,k,r,se {0,1}. 

""^^In the distribution approach, Qy^ can be obtained by inversion of the estimator of the conditional 
distribution. 

"'^'^This assumption rules out degenerated distributions for the distribution of effects, such as constant 
policy effects. These "distributions" can be estimated using standard regression methods. 
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Under the conditions M, C, Q, and RP, the estimators Q\.{u) = inf{5 : F^^[5) > u} of 
the marginal quantile functions of the effects Q\. (u) jointly converge in law to the following 
Gaussian processes: 

(q^H - Qi^iu)) -Z%{Q%{u))/fX{Q%{u)) =: Vi^{u), J, k G {0, 1}, 

in i°°{{0,l)), where f^iS) = fAji5\x)dFx^^{x) and u t— > V^Aj(^)? j ^ {O5 1}' have zero 
mean and variance function fiy^^ (u, u) := E\V^. {u)V^^{u)\, for j, k,r,s E {0, 1}. 

Proof of Theorem 4. The uniform convergence resuh for the marginal distribution 
functions follows from the convergence of the conditional processes in Lemma [9] by the 
extended continuous mapping theorem in Lemma since the integral is a continuous op- 
erator. Gaussianity of the limit process follows from linearity of the integral. The uniform 
convergence result for the quantile function follows from the convergence of the distribu- 
tion function by the functional delta method in Lemma [3l since the quantile operator is 
Hadamard differentiable (see, e.g., Doss and Gill, 1992). □ 

Appendix D. Inference Theory for Counterfactuals Estimators: The 

Case with Estimated Covariate Distributions 

This section presents additional results for the case where the covariate distributions 
are estimated. These results complement the analysis in the main text. 

D.l. Limit theory, bootstrap, and other simulation methods. We start by restat- 
ing Condition D to incorporate the assumptions about the estimators of the covariate 
distributions. 

Condition DC. (a) Let Zj{y,x) := ^/n{FY^{y\x) - FY^{y\x)) and Gx^f) ■ = 
y/n f fd{Fx^{x) — Fx,.{x)) , where Fx,, are estimated probability measures, forj,k G {0, 1}. 
These measures must support the P-Donsker property, namely 

(^Zq, Zi, G^xj ^ (^^/\iZq, a/Ai^i, a/AqCxo? V^Gxi^ , 

in the space i°°{y x X) x e°°{y x X) x i°°{J=') x for each Fx-Donsker class T, 

where the right hand side is a zero mean Gaussian process and Xj is the limit of the ratio 
of the sample size in group j to the total sample size n, for j G {0, 1}. 

(b) The function class {Fy^. y ^y} is Fx,.-Donsker, for j, k G {0, 1}. 

The condition on the estimated measure is weak and is satisfied when Fx^ is an empirical 
measure based on a random sample. Moreover, the condition holds for various smooth 
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empirical measures; in fact, in this case tlie class of functions T for which DC (a) holds can 
be much larger than Glivenko-Cantelli or Donsker (see Radulovic and Wegkamp, 2003, 
and Gine and Nickl, 2008). Condition DC(b) is also a weak condition that holds for rich 
classes of functions, see, e.g., van der Vaart (1998). 

Theorem 5 (Limit distribution and inference theory for counterfactual marginal distribu- 
tions). (1) Under conditions M and DC the estimators Fy^iy) = FYj{y\x)dFxf.{x) of the 
marginal distribution functions Fy^ (y) jointly converge in law to the following Gaussian 
processes: 



V^[F^^{y)-F^^{y))^^,Z^{y) + ^,Gx,{FyM-))=--Z;{y), J, A; G {0,1}, (D.l) 

in i^{y), where y i— > Zj{y), j,k G {0, 1}, have zero mean and covariance function, for 
j,k,r,s e {0, 1}, 

tfjy,y) := y^r^%^{y,y) + ^/X^KE [Gx,(Fy^.(l/|-))G'x.(Fy.(y|-))] , (D-2) 



where is defined as in ( 3.4)- 



(2) Any bootstrap or other simulation method that consistently estimates the law of the 
empirical process {Zo,Zi,Gxo,GxJ in the space {y x X)xi^{y x X) x (J^) x (J^) , 
also consistently estimates the law of the empirical process {Zq, Zl, Zq, Z^) in the space 

i'^{y) X £~(3^) X i'^iy) X i'^iy). 

Proof of Theorem 5: The first part of the theorem follows by the functional delta method 
in Lemma [3] and the Hadamard differentiability of the marginal functions demonstrated in 
Lemma [TO] below with t = Xj ^fn. The second part of the theorem follows by the functional 
delta method for the bootstrap and other simulation methods in Lemma [61 □ 

The expressions for the covariance functions can be further characterized in some leading 
cases: 

(1) The distributions of the covariates in groups and 1 correspond to different popula- 
tions and are estimated by the empirical distributions using mutually independent random 
samples. In this case Gx^ and Gxx are independent integrals over Brownian bridges, and 
the second component of the covariance function in flD.2p is 1;^\Fy^ {y\x)—Fy^ (?/)][Fy^(y|a;) — 
FyJyyy\dFxJyx) for A; = s and zero for k ^ s. 

(2) The covariates in group j are known transformations of the covariates in group 
0, Xi = g{Xo), and the covariate distribution in group is estimated by the empirical 
distribution from a random sample. In this case Gxq and Gx^ are highly dependent 



41 



processes. The second components of the covariance function in (10.20 is f^[FYj{y\x) — 
iFyM^) - FimdFxoix) for = s = 0, J^[Fy^{y\g{x)) - F^^iy)]FyM9{x)) ' 
FimdFxoix) for = s = 1, and J^FyM^) ' F^^{y)][FyM9{^)) " Fl{y)]dFxo{x) for 

Corollary 4. Limit distribution theory and validity of bootstrap and other simulation 
methods for the estimators of the marginal quantile function, quantile policy effects, distri- 
bution policy effects, and differentiable functional can be obtained using similar arguments 
to Theorems 2 and 3, and Corollaries 1-3 with obvious changes of notation. 

D.2. Hadamard derivatives of marginal functionals. In order to state the next re- 
sult, we define the pseudo metric Pl^(p) on y x X, and on JF by 



Pi2(p)((y,a;),(y,x)) 



E\ Zj{y,x) - Zj{y,x) 



1/2 



for J G {0,1}, 



e\GxM)-GxM) 



1/2 



for k e {0, 1}. 



It follows from Lemma 18.15 in van der Vaart (1998) that y x X is totally bounded 
under P'^2(p) and Zj has continuous paths with respect to P'^2(p) for each j. Moreover, 
the completion of 3^ x A", denoted y x X, with respect to either of the pseudometrics is 
compact. Likewise, J-' is totally bounded under p^2(p) for each k. 

Lemma 10. Consider the mapping : C © = £°°{yX) x i°°{J=') ^E = £°°{y). 



^{Fy^.Fx,] 



FyUx)dFx,{x), J, A; €{0,1}, 



where the domain D,^ is the product of the space of the conditional distribution functions 
Fy.{-\-) ^ T on yX and the space of bounded maps f ^ J fdFx,., where Fx^ is a dis- 
tribution function on X, for j,k E {0,1} !^ Consider the sequence {Fy ^Fj^^) E such 
that for a] := (F* - Fy^)/{t^j), dPl := - FxJ/{tVh), and := / fdPl 



as 



t\,0 



a* aj E C{yX,pl, 



%. ^ ^ ^ \-J , HL^(P)J 

for the Fx^-Donsker class T and j, k E {0, 1}. Finally, we assume that {Fy^{y\x), y E y} 
is Fx^-Donsker, for j, k E {0, 1}. Then, as t \ 



t 



'Fy, ,Fx. 



18< 



That is, we identify Fx^ with the map f ^ f f^Fx^ in £°°{!F). 



42 

where 



and the derivative map (a,/?) ^— (p'p p {a,f3), mapping to K, is continuous. 



4>{F^ )-</>(Fy.,Fx^) 

Proof of Lemma [TOl Write ^ — 



{a^j-aj)dFx^ + ^/x'k J FY^{d(3l-d(3k) + \/XjXk j ajtdpl + ^XjXk j {a]-aj)td(3l 

(D.3) 



The first term of (ID.SP is bounded by lly^- / dFx^. 0. The second term vanishes, 

since for any Fxj.-Donsker set JF, J fdPl J fdPk in i°°{J-'), and {FY^{y\x),y E y} C J-' 
by assumption. The third term vanishes by the argument provided below. The fourth 
term vanishes, since | / (a* — aj)tdPl\ < 



I'^^j ~ '^jWyx J M/5fc| < 2||a;* — ajllyA" 0. 



Since aj is continuous on the compact semi-metric space (3^^", p-^2(p)), there exists a 
finite measurable partition [J'^^yXim of yX such that Uj varies less than e on each subset. 
Let 7im{y,x) = {yim.Xim) if {y,x) e yXim, where {yim,Xim) is an arbitrarily chosen point 
within yXim for each i; also let lim{y,x) = l{{y,x) G yXim}. Then 



ajtdPl 



< 2\\aj - aj O TlmWyX + ^ \0!j{yim, Xim)\tPl{li 

i=l 

m 

< 2e + \ajiyim, Xim)\t{Pkihm + 



i=l 

< 2e + tm 

< 2e + 0{t) 



\aj\\yx max/3fc(lim) + o(l) 

i<m 



since {1^;^ < is a Fxj.-Donsker class. The constant e is arbitrary, so the left hand 
side of the preceding display converges to zero. 

Finally, the norm on D is given by || ■ \\yx\/ \\ ■ IIj^-. The second component of the derivative 
map is trivially continuous with respect to || ■ ||jc-. The first component is continuous with 
respect to || ■ ||3;a' by the first term in fID.SP vanishing, as shown above. Hence the derivative 
map is continuous. □ 
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Appendix E. Functional Delta Method and Bootstrap and Other 

Simulation Methods for Z-processes 

This section derives a preliminary result that is key to deriving the limit distribution and 
inference theory for various estimators of the conditional distribution and quantile func- 
tions. This result shows that suitably defined Z-estimators satisfy a functional central limit 
theorem and that we can estimate their laws using bootstrap and related methods. The 
result follows from a lemma that establishes Hadamard differentiability of Z-functionals in 
spaces that are particularly well-suited for our applications. 

E.l. Limit distribution and inference theory for approximate Z-processes. Let 

us consider an index set T and a set © C W. We consider Z-estimation processes {9{u),u e 
T}, where for each u eT, 9{u) satisfies < inf^^e ||^(^, ii)||-|-e„, with e„ \ at 

some rate. That is, 6{u) is an approximate solution to the problem of minimizing ||^'(^, u) \\ 
over ^ e ©. The random function [9, u) i— > ^{9, u) is an estimator of some fixed population 
function {9,u) i— > ^{9,u), and satisfies a functional central fimit theorem. The following 
lemma specifies conditions under which the Z-processes satisfy a functional central limit 
theorem, and under which bootstrap and other simulation methods consistently estimate 
the law of this process. 

Lemma 11 (Limit distribution and inference theory for approximate Z-processes). Let T 
he a relatively compact set of some metric space, and Q he a compact suhset ofW . Assume 
that 

(i) for each u & T, ^(-,1*) : © i-^ possesses a unique zero at 9o{u) e interior Q, 
and has inverse 'i>~^{-,u) that is continuous at uniformly in u 

(ii) is continuously differentiable at 9o{u) uniformly in u & T, with derivative 
'^0o{u),u that is uniformly non-singular, namely inf^gr inf ||/i||=i H^eoH.^^II > 0- 

(iii) ^/n{'^ — =^ Z in X T), where Z is a.s. continuous on Q x T with respect 
to the Euclidean metric, 

(iv) Bootstrap or some other method consistently estimates the law of ^Jn{^ — 

For each u E T, let 9{u) he such that \\^!{9[u)^u)\\ < inf^ge ||\E'(6', + with tn = 
o(n~^/^). Then, under conditions (i)-(iii) 

Moreover, any bootstrap or other method that satisfies condition (iv) consistently estimates 
the law of the empirical process y/n{9 — 9o) in £°°{T). 
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Proof of Lemma llll The results follow by the functional delta method in Lemma [3] 
and by the functional delta method for bootstrap and other methods in Lemma El and the 
Hadamard differentiability of Z-functionals established in Lemma [12] with t = 1/ ^Jn. □ 

The proof of the preceding result relies on the following lemma. Let T be a relatively 
compact set of some metric space, and B be a compact subset of W . An element Q E Q 
is an r-approximate zero of the map Q ^ z{6, u) if for some r > 

\\zie,u)\\ < inf \\z{e',u)\\+r. 

Let r) : i-^ be a map that assigns one of its r-approximate zeroes u),r) 

to each element z{-,u) G £°°(6). 

Lemma 12. Assume that conditions (i) and (ii) on the function ^ stated in the preceding 
lemma hold. Take any zt ^ z uniformly on Q x T as t \ 0, for a continuous map 
z : Q X T W , and suppose that Qt \ uniformly on T as t \ 0. Then, for the 
tqtiu)- approximate zero of {■ , u) + tzt{- , u) denoted as 6t{u) = (^{'^{■ju) + tzt{-,u),tqt{u)) 
we have that, uniformly in u & T, 

Here it is useful to think of t as 1/ ^/n, where n is the sample size. 

Remark. Our lemma is an alternative to van der Vaart and Wellner's (1996) Lemma 
3.9.34 on Hadamard differentiability of Z-functionals in general normed spaces. The con- 
ditions of their lemma are difficult to meet in our context because they include the uniform 
convergence of the functions zt over the parameter space = i°°[T), the collection of all 
bounded functions on T, which is an extremely large parameter space. In particular, to 
apply their lemma we need that the empirical processes ^/n('^ — \E') indexed by JF = i°^(T) 
converge weakly in the space 1°°{J^ x T), which appears to be difficult to attain in appli- 
cations such as quantile regression processes. Indeed, note that weak convergence in this 
space requires to be totally bounded, which is hard to attain when JF is too rich a space. 
See Van der Vaart and Wellner (1996) p. 396 for a comment on the limitation of their 
Lemma 3.9.34. Moreover, our lemma allows for approximate Z-estimators. This allows us 
to cover quantile regression processes, where exact Z-estimators do not exist. 

Proof of Lemma 1121 We have that \E'(6'o(m), u) = for all u G T. Let Zt ^ z uniformly 

on B X T for a map z : Q x T x T) that is continuous at each point, and Qt \0 
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uniformly in u e T as i \ 0. By definition 9t{u) — u) + tzt{-, u),tqt{u)) satisfies 



\\^{et{u),u)-^{eo{u),u)+tzt{et{u),u)\\ < M \\^{e,u)+tzt{e,u)\\+tqt{u) =■. tXt{u)+tqt{u), 

uniformly in u E T. The the rest of the proof has three steps. In Step 1, we estabhsh 
a rate of convergence for 9t{-) to 9{-). In Step 2, we verify the main claim of the lemma 
concerning the linear representation for t~^{9t{-) — 9{-)), assuming that At(-) = o(l). In 
Step 3, we verify that At(-) = o(l). 

Step 1. Here we show that uniformly in m e T, \\9t{u) - 9o{u)\\ < c~'^\\'^{9t{u),u) - 
^(^oH,^^)|| = 0{t). Note that Xt{u) < \\t-'^{eo{u),u) + zt{9o{u),u)\\ = \\z{9o{u),u) + 
o(l)|| = 0(1) uniformly in u eT. We conclude that uniformly inueT,ast\0 

r\^{9tiu),u) - M9o{u),u)) = -Zti9t{u),u) + 0{Xt{u) + qtiu)) = 

and that uniformly in m G T, {9t{u) , u) — '^{9q{u),u)\\ = 0{t). By assumption \£'(-,m) 
has a unique zero at 9q{u) and has an inverse that is continuous at zero uniformly in m e T; 
hence it follows that uniformly in ti e T, 

't{u) - 9o{u)\\ < dH{^-\^{9t{u),u),u),'^-\Q,u)) ^ 0, 



where is the Hausdorff distance. By continuous differentiability assumed to hold uni- 
formly in u G T, ||^(^^t(?i), «) - ^{9o{u),u) - i> eo{u),u[^t{u) - 9o{u)]\\ = o{\\9t{u) - 9o{u)\\) 
so that uniformly in m G T 



t\o \\9t{u) - 9o{u)\\ - \\9t{u) - 9o{u)\\ 

> infj|/,j|=i \\^do{u),u{h)\\ = c > 0, 

where h ranges over R^, and c > by assumption. Thus, uniformly in u & T, \\9t{u) — 
9o{u)\\ < c-'\\^{9t{u),u) - ^{9o{u),u)\\ = 0{t). 

Step 2. Here we verify the main claim of the lemma. Using continuous differentiability 
uniformly in u again, conclude {9t{u) , u) - '^{9o{u),u) - ^eo(„),„[6lt(ii) - 9o{u)]\\ = o{t). 
Below we will show that Xt{u) — o(l) and we also have qt{u) — o(l) uniformly in u & 
T by assumption. Thus, we can conclude that uniformly in u E T, t''^{^{9t{u),u) — 
^(9o(u),u)) = -zt(9t(u),u) + o(l) = -z(9o(u),u) + o(l) and 

t-'[9,{u)-9o{u)] ^ ^-^\^^^^[t-\^{9,{u),u)-^{9o{u),u)) + o{l)] 
= -K\u),u[<^o{u),u)] + o{l). 
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Step 3. In this step we show that Xt{u) = o(l) uniformly in m G T. Note that for 
9t{u) := 9o{u) - t^~^^^^^[z{9o{u),u)] = 9q{u) + 0(t), we have that 9t E 6, for small 
enough t, uniformly in m G T; moreover, Xt{u) < Wt^^"^ {9t{u) , u) + Zt{9t{u),u)\\ = || — 
^9,(u),u{K\u),u[^(^o{u),u)]} + z{9o{u),u) + o(l)|| = o(l), as t \ 0. □ 

Appendix F. Z- Estimators of Conditional Quantile and Distribution 

Functions 

This section derives limit theory for the principal estimators of conditional distribution 
and quantile functions. These results establish the validity of bootstrap and other re- 
sampling plans for the entire quantile regression process, the entire distribution regression 
process, and related processes arising in estimation of various conditional quantile and 
distribution functions. These results may be of a substantial independent interest. 

In order to prove the results, we use Lemmas [11] and [121 We also specify some primitive 
conditions that cover all of our leading examples. In all these examples, we have functional 
parameter values u ^ 9{u) where m G T C M and 9{u) C C MP, where for each u E T, 
9q{u) solves the equation 

^{9,u) ■.= E[g{W,9,u)] = Q, 

where (7 : W x x T i— W := {X,Y) is a random vector with support W. For 
estimation purposes we have an empirical analog of the above moment functions 

^{9,u) = En[g{W,,9,u)] 

where is the empirical expectation and {Wi, ...,Wn) is a random sample from W. 
For each u E T, the estimator 9{u) satisfies ||^'(6'(m), < inf^ge ll^l^)"")!! + ^n, with 
e„ = o(n-V2). 

Condition Z.l. The set Q is a compact subset ofW^ and T is either a finite subset 
or a bounded open subset o/M'^. 

(i) For each u E T, '${9,u) := Eg{W,9,u) = has a unique zero at 9q{u) : = 
(ao(w)', /Sq)' ^ interior Q. 

(ii) The map {9,u) '^{9,u) is continuously differentiable at {9o{u),u) with a uni- 
formly bounded derivative on T, where differentiability in u needs to hold for the 
case of T being a bounded open subset ofW^ ; '^e^u = G{9,u) = ■^Eg{W,9,u) is 
uniformly nonsingular at 9q{u), namely inf^gj^ inf H^I/eoH.n^ll > 0- 
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(iii) The function set Q = {g{W,6,u),{6,u) G x T} P-Donsker with a square 
integrable envelope G. The map {0,u) g{W,9,u) is continuous at each {9,u) G 
X T with probability one. 

Condition Z.2. Either of the following holds: 

(a) the conditional distribution has the form Fy{u\x) = A{x,6q{u)) ; or 

(b) the quantile functions have the form Qy{u\x) = Q{x,6o{u)), where the functions 
9 y-^ A(x, 9) and 9 ^ Q{x, 9) are continuously differentiable in 9 with derivatives 
that are uniformly bounded over the set X . 

Lemma 13. Condition Z.l implies conditions (i)-(iv) of Lemma [771 In particular, condi- 
tion (iii) holds with \/n{^ — ^) ^ Z, in £°°{T), where Z is a zero mean Gaussian process 
with continuous paths in u eT and covariance function 

n{u, u) = E[g{W, 9o{u),u)g{W, 9o{u), u)']. 

Condition (iv) holds with the set of consistent methods for estimating the law of -/n{^ — '^) 
consisting of bootstrap and exchangeable bootstraps, more generally. Consequently, the con- 
clusions of LemmaMhold, namely y^(9{-)-9oi-)) -G{9o{-), -y^ [Z{9o{-), ■)] m i^{T). 
Moreover, bootstrap and exchangeable bootstraps consistently estimate the law of the em- 
pirical process \fn{9 — 9o) . 

This lemma presents a useful result in its own right. From the point of view of this paper, 
the following result, a corollary of the lemma, is of immediate interest to us since it verifies 
Condition D and Condition Q for a wide class of estimators of conditional distribution and 
quantile functions. 

Theorem 6 (Limit distribution and inference theory for Z-estimators of conditional dis- 
tribution and quantile functions). 1. Under conditions Z.l-Z.2(a), the estimator {u,x) h-s- 
Fy{u\x) of the conditional distribution function {u,x) Fy{u\x) converges in law to a 
continuous Gaussian process: 

V^i^FY{u\x)-FY{u\x))^Ziu,x) := -^^^^^^G{9o{u),u)-'Z{9o{u),u) (F.l) 

in i°°{yxX) , where {u,x) t— > Z{u,x) has zero mean and covariance function T,z{u,x,u,x) : = 
E[Z{u,x)Z{u,x)]. Moreover, bootstrap and exchangeable bootstraps consistently estimate 
the law of Z . 
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2. Under conditions Z.l-Z.2(b), the estimator {u,x) ^ Qy{u\x) of the conditional 
quantile function {u,x) ^-^ Qy{u\x) converges in law to a continuous Gaussian process: 

{Qy{u\x) - Qy V{u, x) := -^^^^^(eoW, «)-'^(^o(«), n), (F.2) 

in £°°((0, 1) X X), where the process {u,x) i— > V{u,x) has zero mean and covariance func- 
tion T,v{u,x,u,x) := E[V{u,x)V{u,x)]. Moreover, bootstrap and exchangeable bootstraps 
consistently estimate the law ofV. 

Proof of Lemma 1131 We shall verify conditions (i)-(iv) of Lemma [TTl 

We consider the case where T is a bounded open subset of M. The proof for the case 
with a finite T is simpler, and follows similarly. To show condition (i), we note that by 
the implicit function theorem and uniqueness of 6q, the inverse map \E'"^(yU,M) exists on a 
open neighborhood of each pair (/i = 0,u), and it is continuously differentiable in (yU, u) 
at each pair (/i = 0,u) with a uniformly bounded derivative. This implies that for any 
sequence of points (/if, Ut) (0, u) with u E T, where T is the closure of T, we have that 
M() — ^^^(0,ut)|| = 0(||/it||) = o(l), verifying the continuity of the inverse map 
at uniformly in u. We can also conclude that 9o{u) = \E'"^(0,m) is uniformly continuous 
on T and we can extend it to T by taking limits. 

To show condition (ii) we take any sequence {ut, hf) {u, h) with u E T^h eW and 
then note that, for t* G [0,t] 

At{ut,ht) = t ^{^{Ooiut) +tht,ut) -'^{eQ{ut),Ut)} = ^{e^iut) +t*ht,ut)ht 

-^{Oo{u),u)h = G{9o{u),u)h, 

using the continuity hypotheses on the derivative d'^ /dO and the continuity oi Oo{u). 
Hence by Lemma 5, we conclude that sup^gj. ||^||^;^ W{u, h) — G{6Q{u),u)h\ ^ as t \ 0. 

To show condition (iii), note that by the Donsker central limit theorem for \1'(0, m) = 
E„[(7(iyj, ^, m)] we have that \/n{^! — \E') ^ Z, where Z is a zero mean Gaussian pro- 
cess with covariance function Q{u,u) = E[g{W,9o{u),u)g{W,9o{u),u)'] that has contin- 
uous paths with respect to the L2{P) semi-metric on Q. The map {9,u) i-^ g(W,6,u) 
is continuous at each {6, u) with probability one. The only result that is not immediate 
from the assumptions stated is that Z also has continuous paths on O x T with respect 
to the Euclidean metric || ■ ||. By assumption Z has continuous paths with respect to 
Pi2(P)((e,n),(^~,S)) = {E[g{W,e,u) - g{W,~e,u)fY/\ As \\{e,u) - {e,u)\\ ^ 0, we have 
that g{W,6,u) — g{W, 0,u) — > almost surely. It follows by the dominated convergence 
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theorem, with dominating function equal to (2(5)^, where G is the square integrable en- 
velope for the function class that {E[g{W, 9, u) — g{W, 9, -u)]^}^/^ 0. This verifies the 
continuity condition. The square integrable envelope G exists by assumption. 

To show (iv), we simply invoke Theorem 3.6.13 in Van der Vaart and Wellner (1996) 
which implies that the bootstrap and exchangeable bootstraps, more generally, consistently 
estimate the limit law of ^/n{'^ — say G, in the sense of equation (]A.2I) . □ 

Proof of Theorem 6. This result follows directly from Lemma 12, the functional delta 
method in Lemma [31 the chain rule for Hadamard differentiable functionals in Lemma HJ 
and the preservation of validity of bootstrap and other methods for Hadamard differen- 
tiable functionals in Lemma [61 □ 

F.l. Examples of conditional quantile estimation methods. We consider the loca- 
tion and quantile regression models described in the text. 

Example 2. Quantile regression. The conditional quantile function of the outcome 
variable Y given the covariate vector X is given by X'/5o(-)- Here we can take the moment 
functions corresponding to the canonical quantile regression approach: 

g{W, P, u) = iu~ 1{Y < X'P})X. (F.3) 

We assume that the conditional density /y(-|X) is uniformly bounded and is continuous 
at X'Pq{u) uniformly in -u G T, almost surely; moreover, inf„gT /y(X'/?o('u)|X) > c > 
almost surely; and is finite and of full rank. The true parameter l3o{u) solves 

Eg{W, l3,u) = and we assume that the parameter space B is such that /5o(m) G interior 
for each u G (0, 1). 

Lemma 14. Conditions Z.l-Z.2(h) hold for this example with moment function given 
by ^EM), T = (0,1), Qy{u\x) = x'(3o{u), G{(3o{u),u) = -E[fY{X' (3^{u)\X)XXl and 



Q{u,u) = {min(-u,-u) — uu}E[XX']. 

Proof of Lemma 1141 To show Z.l, we need to verify conditions on the derivatives of 
the map f3 ^— Eg{W, (3,u). It is straightforward to show that we have that at {(3,u) = 
{(3o{u),u), 

■^^^^Eg{W,(3,u) = [G{/3,u),EX] = [-E[fY{X'(3\X)XX'], EX], 

and the right hand side is continuous at {Po{u),u). This follows using the dominated 
convergence theorem, the a.s. continuity and boundedness of the mapping y t— > /y(?/|X) 
at X'Pq{u), as well as finiteness of i?||X||^. Finally, note that Po{u) is the unique solution to 
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Eg{W, f3,u) = for each u because it is a root of a gradient of convex function. Moreover, 
uniformly in m G (0, 1), G{/3o{u),u) > f'EXX' > 0, where / is the uniform lower bound 



To show Z.l(iii) we verify that the function class Q is P-Donsker with a square integrable 
envelope and the continuity hypothesis. The function classes J-'i = T, J^2 = < 
X'P, P G W} are VC classes. Therefore the function classes J-'kj = TkXj are also VC 
classes because they are formed as products of a VC class with a fixed function (Lemma 
2.6.18 in van der Vaart and Wellner, 1996). The difference T\j—T2j is a Lipschitz transform 
of VC classes, so it is P-Donkser by Example 19.9 in van der Vaart, 1998. The collection 
Q = {J-'ij — J-'2j,j = 1, is thus also Donsker. The envelope is given by 2maxj \Xj\ 
which is square-integrable. Finally, the map {9,u) ^-^ {u — 1(Y < X'P))X is continuous 
at each m) G 6 x T with probability one by the absolute continuity of the conditional 
distribution of Y. 

To show Z.2(b), we note that the map {x,6) x'6 trivially verifies the hypotheses of 
Z.2(b) provided the set X is compact. □ 

Example 1. Classical regression. This is the location model Y = X'j3o + V, where 
X is independent of V, so the conditional quantile function of outcome variable Y given the 
conditioning variable X is given by X'i3Q + ao{-), where = X'/Sq and q;o(-) = Qv{-)- 

Here we can take the moment functions corresponding to using least squares to estimate 
Po and sample quantiles of residuals to estimate Oq- 



We assume that the density of V = Y — X'Pq, /y(-) is uniformly bounded and is con- 
tinuous at ao{u) uniformly in m G T, almost surely; moreover, inf^g^ /(q;o(m)) > c > 
almost surely; EXX' is finite, and full rank, and EY^ < oo. The true parameter value 
(ao(^))/3o)' solves Eg{W,a, P,u) = and we assume that the parameter space 9 is such 
that (a;o(ti), /Sq)' ^ interior 9 for each u G (0, 1). 

Lemma 15. Conditions Z.l-Z.3(b) hold for this example with moment function given by 



on/y(X'/3oH|X). 



g{W, a, /?, u) = [{u - 1{Y - X' (3 < «}), (F - X' P)X']' . 



(F.4) 



n , T = (0, 1), Qy{u\x) = x'f3o + ao{u) 



G{ao{u),Po,u) 



fviMu)) fviao{u))E[X]' 
Opxi EXX' 



(F.5) 
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and 



min(M, u) — uu 
-E[V 1{V < ao{u)}]E[X] 



-E[V 1{V < aoiu)}]E[X]' 
E[V^]EXX' 



(F.6) 



Proof of Lemma 1151 The proof follows analogously to the proof of Lemma [TH Unique- 
ness of roots can also be argued similarly, with (3q uniquely solving the least squares normal 
equation, and uniquely solving the quantile equation. □ 

F.2. Examples of conditional distribution function estimation methods. We con- 
sider the distribution regression model described in the text and an alternative estimator 
for the duration model based on distribution regression. 

Example 4. Distribution regression. The conditional distribution function of the 
outcome variable Y given the covariate vector X is given by A(X'/5o(-))) where A is either 
the probit or the logit link function. Here we can take the moment functions corresponding 
to the pointwise maximum likelihood estimation: 

A{X'f3) - 1{Y < y} 



-MX'P)X, 



(F.7) 



A(X'/?)(1-A(X'/?))- 

where A is the derivative of A. Let 3^ be either a finite set or a bounded open subset of 
M'^. For the latter case we assume that the conditional distribution function y Fy(?/|X) 
admits a density y ^ /y(?/|x), which is continuous at each y ^ y, a.s. Moreover, EXX' is 
finite and full rank; the true parameter value f3o{y) belongs to the interior of the parameter 
space 6 for each y e y; and A{X'/3){1 - A(X'/3)) > c> uniformly on /3 G 6, a.s. 

Lemma 16. Conditions Z.l-Z.2(a) hold for this example with moment function given by 



(F;^,T = y,u = y, FYiy\x) = 

G{f3o{y),y) := 
and, fory>y, 



E 



[A{X'Po{ym-A{X'Poiy))] 

A(X'/3o(y))A(X'/5o(y)) 



XX' 



A(X'/5o(y))[l-A(X'/?o(y))] 



XX' 



Proof of Lemma I16L We consider the case where 3^ is a bounded open subset of W^. 
The case where 3^ is a finite set is simpler and follows similarly. 

To show Z.l, we need to verify conditions on the derivatives of the map f3 i— > Eg[W, (3, u). 
By a straightforward calculation we have that at {13, y) = {Po{y),y), 



d 



d{(3',y) 



Eg{W,p,y) 



E[^^g{W,P,y)i[^^Eg{W,(3, 
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where, for H{z) = X{z)/{A{z)[l - A{z)]} and h{z) = dH{z)/dz, 



G{P,y) - E[{h{X'P)[A{X'P) -1{Y <y}] + H{X'P)X{X'P)}XX'], 
R{P,y) = E[H{X'P)fY{y\X)X}]. 

Both terms are continuous in y) at (/3o(y), y) for each y G 3^. This follows from using by 
the dominated convergence theorem and the following ingredients: (1) a.s. continuity of the 
map ((3,y) ^ ^5'(W^, /9o(y), y), (2) domination of \\-^g{W, P,y)\\ by a square-integrable 
function const||X||, (3) a.s. continuity of the conditional density function y i— > 
and (4) A{X'P){1 - A{X'P)) > c > uniformly on /3 e O, a.s. Finally, also note that 
the solution Po{y) to Eg{W,P,y) = is unique for each y E y because it is a root of a 
gradient of a convex function. 

To show Z.l(iii), we verify that the function class Q is P-Donsker with a square integrable 
envelope. Function classes J^i = {X'(3,f3 G 6}, = {^{Y < y},y £ y}, and {Xj}, 
j = 1, are VC classes of functions. The final class 



is a Lipschitz transformation of VC classes with Lipschitz coefficient bounded by c maxj \Xj\ 
and the envelope function c' maxj \Xj\, which are square-integrable; here 1 and c' are some 
positive constants. Hence Q is Donsker by Example 19.9 in van der Vaart (1998). Finally, 
the map 

AjX'P) - 1{Y < y} 

is continuous at each {f3,y) E Q xy with probability one by the absolute continuity of the 
conditional distribution of Y and by the assumption that A{X'(3){1 — A{X'(3)) > c > 
uniformly on G 6, a.s. 

To show Z.2(a), we note that the map {x,9) 1-^ A{x'9) trivially verifies the hypotheses 
of Z.2(a) provided the set X is compact. □ 

Example 3b. Duration regression. An alternative to the proportional hazard 
model in duration and survival analysis is to specify the conditional distribution function 
of the duration Y given the covariate vector X as A{ao{-) + X'Po), where A is either the 
probit or the logit link function. We normalize Q;o(yo) = at some yo G y. Here we can 
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take the following moment functions: 

A(a + X'/?) - 1{Y < y} 



g{W,a,p,y) 



X{a + X'l3) 



A{a + X'(3){1 - A{a + X'p)) 



A(A''/3)(1-A(A'',8)) 

where A is the derivative of A. The first set of equations is used for estimation of ao{y) 
and the second for estimation of /Sq. 

Let y be either a finite set or a bounded open subset of R"'. For the latter case we assume 
that the conditional distribution function y Fy(?/|X) admits a density y i-^ fyiylx), 
which is continuous at each y ^ y, a.s. Moreover, EXX' is finite and full rank; the 
true parameter value {ao{y), Pq)' belongs to the interior of the parameter space 9 for each 
yey; and A(a + X'(3){1 - A(a + X'(3)) > c> uniformly on {a, (3')' G 9, a.s. 

Lemma 17. Conditions Z.l-Z.2(a) hold for this example with moment function given by 

k{ao{y) + x'Po), 
d 



^,T = y,u = y, Fy{y\x) 

G{ao{y),/3o,y) 



E- 



d{a,P') 

and niy, y) = E[g{W, a^{y), j3^)g{W, a^{y),l3o)'] • 



-g{W,aQ{y),(3o), 



Proof of Lemma 1171 The proof follows analogously to the proof of Lemma [T6l 



□ 
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